How to Deploy YOLOv5 Model with OpenCV and Python

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so here we see the results these are all the optic detection that we're doing with our yellow v5 small model we can see all the persons here even though the image is pretty blurred here um as we can see it's still detecting all pretty much all the persons here in the image even if we can only see like the top of the head of them and they even turn like the bag against the camera hey guys and welcome to the new video in this new network tutorial in this video here i'm going to show you how we can actually deploy a yolo v5 model with opencv so in the previous videos we have actually trained the own custom update detector with yolo v5 and this one here we're just going to deploy the yoloway5 model with opencv and then in another video we're going to do it with the model that we actually like trained our optic detector on but first of all remember on disk a server i'll link to the description here you can join channel shadows about computer vision deep learning ai and so on you can always become a member of the channel if you want to support channel with a small amount of fee everything will go to create more and better quality content here on channel also if you have some problems in your own projects or applications i can also help out the members of the channel with some guidance and some help in their own projects so thank you guys so much visual studio code here and i'm going to show you how we connect like deploy a yellow v5 model with opencv so basically here we're just going to to take a youtube video then we're actually just going to pass that youtube video through our uh ideology 5 model then we're doing update detection all the updates direct like have in that video but first of all here i want to show you what the different types of like modules that we need to actually be able to to create this and deploy our yolo v5 model so in this video here we're going to use something called uh pi fi or pathfind is just pip install and then it's pa pi of pathfind so this is we're going to use this to actually just be able to take the url from youtube and then we can actually download the video and pass that video through our yolo v5 model then we also need something called youtube dl so we just type in youtube dl and if you're gonna install these two modules here we can actually run the program here the program will be down in description you can just take it from my github and you can just play around with yourself you can take your own youtube video you just need to copy your uh the url pass it into object detector here and then it will just do optication on that video and save it to your direct to it to your directory so you can just play around with yourself take an arbitrary video or even your own videos and just use those to do optic detection on so these are modules that we need and later on i'm going to show you some other different kind of things that we need to do before we're able to run this program first of all here we're going to import torch so make sure that you add that you have also already installed torch on your computer if you're using a gpu you can actually use that as well if you have cuda you need to install cuda from nvidia's website after you install kuda if you have a gpu we can actually use that with pytorch or you can just use your cpu but it will go a lot faster if you're using the dpu drive like do and deploy our yolo v5 model with so here just want to show you how we can actually install part torch on our computer either if you're using the cpu or the gpu because we're going to use pytorch to actually deploy our yellow v5 model then the output from that model would just be a numpy ray and when you play around with it as in as we have done in all the other videos with ohmsv because it's just non-power rates that we're working with and also the input to the models is also just a numpy array so we can use ohmsv images to pass through our model that is running in pi torch here on our gpu or cpu but first of all here we need to go down here and choose the build so just go for a stable one choose your offering system that you're on in this case i'm on windows and i want to install it with pip installed here in the terminal or in my command prompt and then we can also specify the language so we're going to specify for python and the version here of cuda that i'm using is 11.3 we're using this one here because i have a gpu on my computer so that sort of like the deployment process or like in interference will be a lot faster because we're utilizing the gpu instead of only the cpu if you don't have a gpu on your laptop or something like that you can just specify the cpu here and we can just pivot install torch uh with this command here down at the bottom but here we just take pip install we just copy paste this command here that we get from pytorch again we just go into our commands print and then command prompt and then just paste in this line of code run it and and then they will just download power torque to your computer with gpu support just to make sure that you've already in cuda 11.3 installed from nvidia's official website so we're going back into the code here we will now go through the code we're just creating this update detection class where we're doing some different kind of things like loading in our models drawing the boundary boxes and also the labels for all the objects that we have in the image so first of all here we're just importing pytorch numpy ohmsv pathfi here and time because we want to time how long it takes to run our images through our new network or our your v5 model so here we just have a class called object detection it just implements the v5 model for interference on a youtube video video using opencv so at the bottom we're just going to pass in a url our initialization here of our class will just be in url and we also need to specify the file name that we want to output our file with boundary boxes and labels too so we can then play that file and see all the objects that we're able to detect in that video using uh the audio v5 model that we deployed here in python with omcv so here we're just initializing these so we just have our attributes here for a class we just have our self attributes with url we also have our model that we're going to create we will have a function called load model which basically just loads in the model that we want to use we have some different kind of classes it will just be an attribute here of our model so inside model we can just go in and get the individual classes or labels that we have for our object detector then our output file equal to our output file that we specify here in this in constructor for our class and then here we're specifying the device that we want to use so we're using cuda if torch.kuda is available or else if we don't have a gpu or we haven't like built our torch here or pi torch with gpu support we're just going to use the cpu here it will print out the device that it's using so you can see if you're either using the gpu or the cpu when you're actually running inference with this yolo model then we need a function to get the video from the url and here we just create this uh path pathway here we just use this one here so i can like get in the url so we just pass in the url to this pathfi here and then just get a play attribute or like a play variable and here we just make sure that it is not non because if it's not non we're not able to actually load in this url or like the stream here from the url which will just be a youtube video and then we can open up a video capture here with opencv as we have done in all of the videos throughout this channel inside here instead of the video capture where we usually just take a camera index uh which is our webcam like serif index for a webcam we're just going to space about this play dot url so instead of an index of our camera we just pass in a url for the video that we want to do optic detection on right now here we have a model load model function again we're just inside still inside of this class basically just it's just a model here so this will variable here would be our model we set that equal to torch.hop.load so this is basically just a function from pi towards where we can actually um import or like load models or like neural networks that we want to do inference on so after you change your model we can just use this one here to load that model that we have trained somewhere else it could have been the google collab and so on if you want to know more details about how you can train your own dual ov5 models and so on i have videos about that here on the channel so make sure to check that out or if you want to generate your like or create your new network from scratch like convolutional neural networks do interference and learn about like how neural networks work make sure you check that video or like that tutorial out here on the channel it's really useful to get started with deep learning neil networks and so on but here we're just loading a custom or like not a custom but a pre-trained yellow v5 model we're going into ultra linux here and then the old v5 directory so this is just the directory where the yolo models are stored and then we just specify the model that we want to use in this example we just want to use a yellow v5 small model we can also specify the large model or like the nano model or something else but this is a fairly good model it's very very accurate but it's also really fast when we're doing inference we also need to set pre-trained equal to true because we just want to use this pre-trained model on all the different kind of objects that we have in the coco data set so we want to both detect cars persons people pedestrians different kind of objects in the scene so we just need to specify this pre-trained i'm going to create another video where we get where we're actually creating our custom model ourselves as we did in one of the previous videos and then we like to like specify custom here and then we just need to specify the path to the weight of that model that we created and then we can actually just deploy our own model in that way but again i'll create another tutorial where we're going to do that and open up the webcam and use our own neural network that we created with the webcam so everything is set up and you will be able to run the code yourself afterwards we're just returning the model here so we can just use this function to load our actual model then we need to set up a score frame here so we can see here that it just takes a single frame as input and then the score uh the frame using the yolo v5 model so this is basically just where we take an image and then we do a forward pass in our neural network or in our yolo v5 model that we just loaded up here so first of all we're just setting our model to the device that we're using so if we're using a gpu we need to take a model and then we need to set it to put it up on the gpu or if you're using a cpu we just need to specify that we're using the cpu tracks like do the interference here we just set the frame equal to the frame so there's just a format here that we need to use to be able to actually pass it through our model then we just do a four pass here so we just take our model we pass in the frame through our model and that will that'll actually give us the result in that frame so we take a one single frame from from the video and then we just do predictions with yellow model and then all the boundary boxes and all the labels that it detects in that single image will be restored or like will be stored in this results variable that we can see then we need to actually get the labels and also the coordinates for our boundary boxes so we can then later use that to draw it on the images so we can actually see where are we detecting these different kind of objects in the scene but also the labels of those objects we can go inside results here so the labels will be the first one here so we're just going to inside results x y x y and then n and then just take the serif index and then we're just taking all the values in the first in the first column here and then we're just taking the last index here we're doing the same thing for the coordinates so these will just be the labels for the predictions and here we can actually get all the all the all the bounding boxes so here we're just taking all the values in the first column here as we can see so again we're in the first zeroth index here in the results and then we just take all the values here up until the last value because we just want to get all the coordinates that is stored in this column here where over here we just want to get the labels uh for the corresponding for the corresponding coordinates for our boundary boxes then we can just return all the labels and all the coordinates that we have act like detected in that single frame that we just passed in here and passed through our model into the results so now we're going down and get the class to label so we just pass in an x here so for a given label value returning corresponding string label so actually like need to go inside the classes that we declared and then we have this index here for our label so basically we just pass in our label to our class just to find the exact class so it could be like we have one index for a person one index four car and so on and then we're just returning that um class to label so we're basically just converting from a class uh to label that we can then use when we're plotting our boundary boxes the last part here before we're actually going to go down um throw in a youtube video into this um update here or like into this class do predictions and see the results we also need to plot the bounding boxes that we got from our results variable that i just showed you so here we're taking a frame here and its results as input and then we're just plotting the boundary boxes and labels around them um so here we just like the parameters here result it contains the labels and the coordinates predicted by the model on the given frame and then the parameters here for a frame uh and this is basically just what has been scored and then it is returning the frame with the bounding boxes and labels plotted on it so when we have this frame with bounding boxes and label plotted on it we can basically just throw that out into a file and then we can just do that and concatenate all the frames that we're actually detecting so we can generate our video again but with all the object detected in the scene so here we're just going to take our labels and the coordinates for boundary boxes then we take our n which is just the number of labels that we have actually detected in our image so let's say we have 10 cards in our image and then end here will be 10. then we're just going to take a shape and y shape and then we just have a full loop where we're running through all the detections or all the labels here that we found in the image and then basically here we're just setting up these functions to draw the x 1 y 1 x 2 and n and y 2. so this is just the bounding box around the object we're specifying the color so we just want to draw all the boundary boxes in green and then we just draw a rectangle around all the objects that we're detecting in the frame and then we're just putting out the text so we actually just pass in the label so we want to play display the boundary box and then the corresponding label for that boundary box on top of the boundary box so we can actually like see a person or go back like taking a person here we have a boundary box around that person and it says person above it then we're just returning the frame that we can then use later on to actually display it uh somewhere else it could be in your home cv or we can just like save it to a file the last thing here that we're going to do is actually adjust our call function so this function here will be automatically executed or called when we are creating our object so here first of all we're just going to get our player so we just get our video from the url then we're opening up the player or we make sure that we're actually opening up our video so we can take in each of the frames from our youtube video then we specify x shape and y shape that we want to store or like save our video format to then we have our four cc here so we can actually use this ohmsv video writer and then we just have this m jpeg so basically just just a video writer so we can write all out all the frames that we're doing optic detection on write that out to our files we can then later on display and see the updates detected in that youtube video they'll be loaded in so this will just be our output we just set up a video writer with our output file name our four cc and then some other parameters so this is the dimensions of the video that we want to create then we have the final while loop here so as long as as we have frames in our video we're just going to do all of these things and call these functions that we just went through so here we're just starting timer so we're timing how long it actually takes to do optic detection on each individual frame we can use that for a lot of different things and we can even compare like gpu time and cpu time together it will go a lot slower if you're using the cpu compared to our like a fairly good gpu it will run pretty fast and our invariant time will be uh will be fairly good even though we have large images that we want to do updation on and maybe detect 10 20 optics in one single frame so here we're just loading our frame from a video we pass our frame through our score frame so we then get the results then then we throw in the result to our plot boxes and then we just plot our bounding boxes with the corresponding labels that will result in this in that in this frame here we end our timer we calculate the number of frames per second then we just print out the number of frames per second it took tracks like do this after detection in our frame and then we just write out our frame here to the video file that we want to generate and then after program is done we can just go into that file and look and play it and then we can see all the objects detected in the whole youtube video so this is basically everything we have to implement we have implemented the whole class here that just does everything false you can go into my github just take the code and you can play around with it find your own youtube videos pass it through and see what types of optic exactly able to detect and so on and you just need to specify the url here and that's the first input argument to this optic detection when we're initializing or instantiating this class or object and then we need to specify the output file name here so in this video in this example here we're just going to specify the video2.avi then we just create an instance here of our object and it will run all the code here because we have this call function that will call itself when we act like creating this optic detection down here at the bottom so now we're going to run the program we'll get an error here because youtube had have it have it has actually like deleted um deleted the discount like button so here we can see that this like count it can't find it we just get a key error but we can just go up to the top here and see okay when we're loading in our program we can see that we're using yellow v5 model we're using cuda so cuda 11.3 we also have a device connected so right now are we using the gpu we have nvidia g4 gtx 1060 with using the layers we get a summary of our model so 200 uh 213 layers we have around like what is this seven million parameters in our neural network that we have like pass on our image through uh this whole new network architecture here and then we're also printing out the device that we're using so in this example we're using cuda if it says cpu is because you either have not um downloaded pytorch with um with gpu support or like with cuda or you need to download cuda from the nvidia official website first but here you will get the same error here because this is just an outdated library for our youtube dl which we pip installed in the start of the video so here we can just see that we open file in the editor we just hit control and then we click then we actually just get into this file and because youtube has deleted the discount like button uh we need to delete this first before we can actually run our program so just go in delete it here save the file and then close it again and then when we run the program now it works and it will actually like do optic detection on this youtube video that would here that we that just initialized here so device code used and then here we see the frames per second that we get for each of the in the middle of individual frames that we load in from youtube video this will take maybe like around five ten minutes to run through because we need to run through a three minute video and do optics and then on all those frames um in the three-minute video but we see we get around like 40 frames per seconds when we're actually doing interference on the gpu here so again it's act like really fast when we're talking about like a new network that has seven million parameters and we're also talking with high resolution images that we need to pass through our model to do optic detection on and even though we're actually detecting like maybe 10 20 30 optics in in in all the frames here that we actually go through so now we're done doing updation on the whole youtube video that we have over here to the left we just have this video which is the output that we just got here so we load in all the frames and then we just store the result of our update detection in this video file over here to the left then we're just going to reveal it here in the file explorer and actually just show the video so we can see how it detects the people in the image so here we see the results these are all the optical detections that we're doing with our yolo v5 small model we can see all the persons here even though the image is pretty blurred here as we can see is still detecting all pretty much all the persons here in the image even if we can only see like the top of the head of them and they even turn like the bag against the camera is still a fairly good optic detector and again this is the small lv5 model that we're running so here in this video we're just getting a couple of different scenes we have around three minutes in the video so here again it's pretty blurry we can still see all the different kind of detections of the persons i'll just play forward here a bit here we get another scene again we're taking all the persons sometimes we're detecting like handbags suitcases sports free speech sports ball person we just get all these objects here detected in the image and we can see this is even a really high resolution image that we just passed through on the network and we still get around 40 frames per second because we're utilizing the the gpu and i don't really have that good of a gpu so we can actually get better performance with a better cpu and so on here we see it detects handbags suitcase person just play a forward of player play forward here so here we get another scene we can see we're taking a truck a car all the prisons here again it's pretty blurry we're detecting a car we're tracking it's pretty good here while it's driving through the image from one side to the other we have this fire hydrant here as well detected all the persons here we have a bicycle here a person on top of a bicycle a truck we can just see all these different kind of things even though we can only see like some some partial things of a person it still detects it as a person we can see here up at the top we can only see their feet or like over here to just in the top here we can only see their feet but it's still detecting them as a person like this guy walking past up here at the top we're detecting all the persons even the handbags sometimes in their hands um if it's pretty obvious that is act like a handbag here just another scene and parallel probably lasting we have some traffic light up here we have a lot of persons we also have some false positives over here so when we're using the yolo v5 model we also get a confidence score so we can actually threshold out some of these things by saying that if it's under some threshold value then we won't consider it as an optic we need to be more than like for example 50 60 70 percent uh certain that is acts like an object before we're displaying it and drawing these mounting boxes as an example i'll just move the camera here a bit we can see over here to the left or left to the right we can only see the wheel here of a car but it still detects it as a car in the image so we can see it's actually really good performance so thank you guys watch this video here and again you can basically just take a youtube video pass it in here and then just run this python script uh and then you just need to specify the output file name here that you want you just take an arbitrary video you can just pass it through here wait a bit five ten minutes and then you can see all the detections in the videos afterwards so thank you guys for watching this video here and remember the subscribe button and below vacation under the video also like this video here if you like the content and you want more in the future it just really helps me and the youtube channel out in a massive way i'm currently doing this computer vision tutorial where we're talking about basic image operations camera calibration stereo vision how we can use stereo vision to get depth information in our in our images and about the environment create point clouds from stereo vision and so on and do different kind of operations so it's a really awesome tutorial i'll link to it up here or else i'll just see the next video guys bye for now

Info

Channel: Nicolai Nielsen

Views: 27,471

Rating: undefined out of 5

Keywords: yolov5, yolov5 neural network, yolov5 custom object detection, yolov5 object detection, yolov5 tutorial, object detection yolo, object detection pytorch, object detection python, opencv object detection, opencv yolov5, opencv python yolov5, object detector, object detection yolov5, opencv, yolov5 dataset, neural networks, detect objects with yolov5, yolov5 opencv, opencv dnn, opencv neural networks, deploy yolov5 model, deploy yolov5 model opencv, how to deploy yolov5

Id: 3wdqO_vYMpA

Channel Id: undefined

Length: 23min 35sec (1415 seconds)

Published: Wed Jan 12 2022