Real-Time Object Detection with YOLOv7 and OpenCV: From Training to Deployment

Video Statistics and Information

Video

Captions Word Cloud

Captions

hey guys and welcome to a new video in this video here we're going to do custom update section with new state-of-the-art yellow V7 model so we're actually going to take a model and then we're going to deploy it with opencv I actually have a core so this is one of the videos inside of my course so if you're interested in that if you want to know like how we can generate your own data set generate your own custom data set train the models how we can label your data set uh we'll jump into Google codelab we will train and act like YOLO V7 model I also go over some of the theory behind the architecture of the yellow V7 model and then we will also see the results then we basically have our data set we train our model then we export our data set both to like the pi torch format but also the onnx format as I'm going to share in this video here we're going to deploy it with the pi torch format with opencv we're going to run live inference on a webcam where we're actually like detecting these updates live with the yellow V7 model so if you're interested in the whole course where we go over like everything from like creating the model generating our own custom data set training and model labeling the data set and all those different things and have tons of videos I have 20 plus videos about that in my course and we also have some quizzes and all those different kind of things also the code will be available inside of the course as well so we're now ready to do inference with our custom YOLO V7 model so here we have actually download our data sets we're both converted to to the onx format and we also have this Pi towards wait file so we both have uh the best train model from our like YOLO V7 model from the data set so we have our best.pt and we also have our best on and X to start with here we're going to act like just do inference and open Serene we're both going to do it with the with the PT file and also with the onnx with opencv so basically here we're just going to use the text grid that I modified for only live webcam and inference so basically here we just have like all these Imports that we need to do first of all so this is from the ulv7 repository then I made the changes so we only do it on the webcam so basically here we have our Tech function we pass in the source so that will be our index to our webcam then we have our weights here which will be the weight file then we just load those weights into our neural network or like into our model so that is actually just if we're not going to use the onnx which we're going to do in the next video there are some benefit with using the n o and NX format over the PT format which is basically that which you can just directly Implement them into opensv without using these utilities functions from YOLO V7 we can also like deploy them on Earth devices and so on and we can also like use it for all the other different kind of like Frameworks so if you want to do like inference with the on the next runtime with openstv with all the other different kind of like Frameworks where you can load in on the next models so basically here we're just going to open up our webcam we also have our device so if you have a TPU available you can use that or else it will just use the CPU we also need to specify the image size interest like intersection over Union here so we have a threshold for that so how much do we actually want have to have our boundary boxes on top of our act like objects so we set up a threshold for that and then also have a confident threshold here so if we have like too many false positives we can increase the threshold or if we have too many detections we can also like lower or like increase our component threshold here or if we have if we don't get any detection at all we should act like just lower our threshold value here so we're getting more detections so first of all here we're going to set up our webcam so we're just going to check if the source is numeric then we can go down and select our device basically here we're just setting up device so we can choose if we're using uh decoder or like the CPU and then we just specify that when we're in the cold function then we're going to load the model we just have these utility functions here that we're going to use so we're basically just pass in the weights and also the device so we're actually like load our model into our device as well if we're using Cuda so basically we're just loading in the files then we have our model variable over here or the instance of our model then we basically just do the strides we'll just set up the image size for our images that we want to pass through the model so we have our image size which is specified image size and also the number of strides strides that we want to do here we just check if we're using half position then we just set the model here to floating Point 16 so that is if we're using like for example um the GPU so if you're using the CPU we don't need like half position but if we're using GPU we need to set it to floating Point 16 bits here we set the data loader so we basically just check if our webcam so if we're going to use the webcam then we're going to like check the image show we're going to view the image so these are just utility functions doing wrappers to like opencv functions so again this is just a function from the ulv7 GitHub repository that's just modified for our project and for our own custom like ulv7 detection so we don't have all the other different kind of like returns and things that we don't need and then I'm going to show you afterwards how we can deploy them with the onnx framework where we just like open it up we just take our model open up with on the next then we do the inference with that so we don't need to like uh we don't depend on all these utility functions from um eurolv7 then we're going to set if we want to a benchmark so we can set this true to speed up constant image size and inference which is the case in our case or like in our project then we're going to load our stream from our source so basically here we just create our data set so we've just like resize the image to the correct dimensions and then we also just set the source here so basically we'll just take our webcam we pass our webcam in here then it will just load in an image from that webcam and return it here resize it and then it will throw it back into the data set then we can basically just take this data set which will be which will be an image and then we just throw that through our model here we just get the names and the colors of the classes that we want to detect so we have these different handle like cups that we want to detect in our image and then we can just directly run inference here so here if the device type here is is is uh is not the CPU if there's for example like running on the dpu we need to pass our model and also our data set to our device which we're going to do here we will do it to the with the data set down here at the bottom later on but basically here we're just going to run a follow-up here through our whole data set we're going to take the video capture we have the path and we also have the images but in this example here we're basically just going to have the images from our webcam that we're loading in then here we set up a torch tensor we put the tortensor off the image to our Cuda device if you're using Cuda if you're not using like if you're using CPU it will just fruit on the CPU but I'm going to test it with a Cuda in this video here and then we can see how fast it actually runs when we're doing inference we just set up some different kind of like things here some pre-processing wherever normalize our image so we just have values between server and one instead of zero and 255 which is a requirement for the model before we pass it in then we're going to do some warm-ups so mainly here we're just going to do some warm up with our images then we can go down directly to the inference so basically here we just have our model we pass in our image to the model we take the served element of that which will be the prediction so we have our model throwing the image we do a forward pass with our model and then we get the predictions here which will contain like all the boundary boxes all the classes the confidence score like the intersection over Union and all those different kind of things and then we can apply like non-maximal suppression so if we want to have like more reliable and also more robust bounding boxes and predictions we can apply this non-max amount of pressure suppression so we're basically just throw in our predictions confident threshold and in um interact intersection over Union threshold so just pass that in and then we'll get our new predictions which will just be these uh better predictions from our model then we can just process the detections we won't really cover that in details but basically it just takes the frictions iterates over all the directions and then it just plots all the boundary boxes the classes it rescales the coordinates back here to the original image size again and then we just draw like all the different kind of like information that we have so we just plot all the boxes all the labels and also we can also get the confidence score out and stuff like that but basically here we're just drawing the bounding box and also the label for the optic that we're detecting then we're just going to have IM show so we're basically just going to show what we're detecting in the image we also have some timers going on so we can see how fast it actually does and ference so we can both time the inference speed on our GPU or on the CPU then we just have our main function here so basically we just check our if Cuda is available if that is available we use the Cuda device or else we're just going to use the CPU we're just printing the device here so we spit so we actually know what device I'm using and then with torch that no no grad here so we're not calculating gradient here because we're just doing interference uh we're not training right now and then we just call this detection we pass in the index of our webcam so we have zero here then we have our weight file so this will be the pi torch weight file so not the one that we converted to on the next but this is directly the output from our YOLO is your V7 model after training we pass in the device so we have Cuda we're passing the image size that we train our model on the intersection over Universal and also the confidence threshold over here to the right so now we're basically we went through the whole code here so this is how we can do interference with the olv7 model and now we're ready to run the program and see the results so now we're running the program here first of all we can see the dpu that we're using so here we can see that we're using a 40 90 DBU we're fusing the layers we're setting up like a model summary we also get like the image size that we load in and stuff like that and then when we actually do inference we will also get the inference speed but here we just print out the device that we're using so we're using cool right now and then if we take the camera up here we actually start doing the detections so here we can see that we act like do these detection on the the cups so here we're going to see in the front here we're detecting the Cocker cup with a confidence score up of about like 0.50 we have the Halloween cup with a confidence score of about like 0.6 and they also have the white cup over here to the left of 0.75 so it's actually like really confident and with that type of object so again this is really good result that we get and it runs really fast so let me just make sure here down at the bottom uh so react like run around like 100 frames per second here so it's around like 10 milliseconds on this 40 90 dpu if we just go over here to the left we can see that we have this hand painted cup we can also detect that we can like try to like rotate around we can see that we still keep track of this cup here even though we're rotating it around we can take it up here in the hand rotate it around here in the top we lose detection so we didn't really have that many samples with the cups from the top but again as long as we can see the cup here we have a really high confidence that this is act like a hand-pinned cup which is correct and again remember that we only had like around like 100 images in our data set and now we actually able to do this custom cost and object detection with our yellow V7 model so again the Eurovision model it's really good it has some really like high accuracy as we can see here it just keeps track from frame to frame so all these frames here it takes the cup it doesn't look track of them even though I'm moving like the camera around it just it snaps on the object and it just keeps it really really high accuracy it doesn't really lose track and it's fast it's really really fast so this is a really crazy model like way better performance compared to like for example like yellow V5 and all the other like different kind of like optic detectors again you saw how easy it was to just generate your data set with roboflow train it do The annotation and then just do inference here so right now we're just doing interference with the pilot swords script we can just directly use that you will get access to all of the codes so you can directly run it on your own computer if you're using the GPU or the CPU but this is some really nice results we can try with some of the other different kind of like cops here so we have the Wide Cup again if we just take it up here so most of the images that we took was actually like with the cops a bit further away but just to make sure that it actually works if we take them closer it does so it's actually like a pretty robust and general model we can even like rotate it around it still keeps track of it we can try with some of the other cops here as well so this Halloween cup here is a bit more unique even though I take my hand around it we start to lose track of it because we're having really trained on that but here when we get the cup handle over to the right it is just way more certain that this acts like a cop and not just a cup but the Halloween cup we can also take the Cocker cup here this is pretty unique unique as well just keeps the detections even though we can maybe try to like move it around here still detects it we can even see even though the Y cup here is occluded in the background it just still does a really good um good job at excellent tracking on and detecting this object we can see how much it's actually occluded here in the background and it still does detection I'm actually pretty impressed with these results here uh from this model here that we only train like on 50 epochs again you can train it for more epochs try to see if you can increase the performance you can also try it with some of the larger models that we have talked about but this is some really crazy results and I'm also like really excited to see the results when we're going to do the inference with the owner next model to see if we actually like get a better like better performance or if we get like faster inference time so this is really cool I'm really excited for it we got some really crazy results and I'm even like I'm actually like mind-blowing by the results that we get here you can just do it on all the other different kind of like custom Optics I hope you guys are like creating some really nice data sets Uh custom data sets with some other different kind of like Optics of instead of cops you can do like cars like leaves or maybe like apples hanging on a tree do customization making different kind of like fruits and and all these different kind of things so it's really cool I'm just really excited to see the results with the on the next model

Info

Channel: Nicolai Nielsen

Views: 15,274

Rating: undefined out of 5

Keywords: yolov7, yolov7 neural network, yolov7 custom object detection, yolov7 object detection, yolov7 tutorial, object detection yolo, object detection pytorch, object detection python, opencv object detection, opencv yolov7, opencv python yolov7, object detector, object detection yolov7, opencv, yolov7 dataset, detect objects with yolov7, yolov7 opencv, opencv dnn, opencv neural networks, deploy yolov7 model, deploy yolov7 model opencv, how to deploy yolov7, yolov7 vs yolov5

Id: XzUMigbYRUI

Channel Id: undefined

Length: 13min 43sec (823 seconds)

Published: Mon Nov 21 2022