3D Object Detection and Pose Estimation with Deep Learning in OpenCV Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so this is the results that we get so now we're just detecting this cup here in the image frame as i showed you in the code then we can see we we actually draw the cup so the the name of our class that we're detecting we're drawing the 2d bounding box and then we're also displaying the pose of our object that we act like estimating so this is a six-dimensional update detection so here we have our rotation and then we also have our translation over here to the right hey guys welcome to a new video this computer vision tutorial in this video here we're going to do 3d object detection and post-estimation of that object so we're going to do it on a live camera we will open it up with opencv and then we're going to use something called efficient pose so basically we just have a model that is trying to do 3d object detection and also estimate the pose of those objects that we detect in our frame but first of all remember to hit subscribe button and qualification under the video here only ten percent of you guys watch these videos here actually subscribe to the channel it's just a single click and it helps me and your channel out in a massive way you can also become a member of the channel if you want to report channel with a small monthly fee everything will go to create more and better quality content here on the channel also if you remember the channel i can help you out if you have some problems in your own projects i can help you out give some guidance and so on if you remember the channel so thank you guys so first of all here in this video we're going to start in this github repository for efficient posts so this is basically the code i have that i've just cloned and then i have all also made like this interference on live webcam and then we're taking specific objects that we want to do 3d post estimation off and also also detection off so here we see the different kind of files that we need but down here at the bottom we can actually see where it is so this is called efficient post so we're using deep learning to actually do 3d object detection with the boundary boxes as we can see down here at the bottom we can see some 3d bounding boxes and then we also estimate the pose of each and each of the individual objects that we detect in the scene we can see the number of frames per seconds that we get over here uh with different like types of values so this is actually just a keras implementation that i'm going to show you guys how we can set it up and how we can run the interference with a live web camera on your own computer all the code will be on my github so you can go down in my github link under the video here and then you just go into my github clone this repository and run it on your own computer as i'm just going to show you so we need some instant installation first of all i'll basically just make a reference to this website here or like to this github repository so first of all you need to clone this repository or my repository on uh on my github and then you're going to create a new environment i'm using anaconda then we're going to create a new environment with anaconda because this is running on tensorflow uh version one where the newer nearest like things are actually like running on version two of tensorflow and it's really hard to go like from tensorflow version one to two so you you might need to install like tensorflow version one here it is best to just like create a new environment for your specific project and then we can also use a gpu if you have a tpu available on your computer so we can speed up the interference it will likely be way way faster if you have a gpu and you can run your interference on the gpu so we need to install tensorflow then we need to go into the repository and install the different kind of like dependencies that we have so we have like some opensv numpy uh tensorflow and so on that we're going to use and then we can basically just compile like a siphon modules because we're going to use hyphen so we can actually run this code here and also run it faster so we need to compile some size and modules that is also uh that will also be used so basically just follow this guideline here or like these these steps here for actually installing these different kind of things you also need microsoft visual studio uh studio so you can actually go in and compile the cypher modules um as well but everything you can just follow this tutorial here and then you can get it up and started so here we have a dataset and the pre-trained weight so in this video here we're just going to use some some pre-trained weights here from this repository we can also have our own training so basically we will just have a path to our data set and then we can just specify the weights where we want to store the weight file and then we can just call this python script here train dot pi and then we can actually just train our own model model here on this data set here or like on this model pre-trained model they're using a data set called line mod and also occlusion and i think they're also using like the coco data set for 3d optic detection as well we can also see like for inference here we can run their inference python file here we can also do it on a webcam which is the the python file that i have just like edited so we can run it in opencv and on your own computer and take specific objects and act like do opticization with that so here we can see that we both get like 3d post destination and also detection of those objects in the in the scene or like from the images that we have in our camera so let's not jump into the code here i'm just going to go over the code here like kind of fast and then we're going to run it and then we're going to see the actual results that we get by you doing like 3d object detection and post estimation of those objects so first of all here we need to i have our anaconda environment so i've just created my in anaconda environment we see here i have an environment called efficient pose and then we basically just have python 3.7 and then i just installed tensorflow one point something with gpu support so we're going to run this on onto gpu so first of all here i'm just going to choose this environment we're going to import the different kind of modules that we need so we need opencv to open up our act like webcam numpy some map modules and also tensorflow and then down here at the bottom we're just going to import the model so we're going to have our efficient model built and then we also have some utils to actually pre-process the image before we can pass it into our efficient nets so we're using some kind of like efficient net structure as a backbone for extracting features and then they build this post um estimation narrow neural network on top of that efficient net uh try to like do the six dimensional or like 60 post estimation of our objects in the scene they also have some utils for our visualization so we basically we can just draw our detections that we're detecting in our image then we have our main function which we can actually just run efficient pose in inference mode live on a webcam so we're just going to open up a webcam loading the images pass that image through the model and then we just get the results out and display those results on a window here from opencv so here we see that we set our cuda visible devices here equal to zero so we're going to use the gpu if we actually have a cuda install and we also have an nvidia nvidia graphic card here that we can run our inference on instead then we can specify the path to our weight so here we see we have different kind of like weights if we're going to the weight files up here at the top so in here we have a folder called weights we just go inside that then if you want to use the weight trained on the coco dataset or like the line mod or the occlusion data set we can just go in here and then we can choose the different kind of like pre-trained models that we have so this way here is going to go with efficient that the serum so we're just going to specify the path here to our weights and then we can also specify if we want to have a safe path if we want to like save our results somewhere in a directory or like in a file or folder then we can also specify the image extension and so on we're not going to do that in this video here because we're just going to open up a live webcam and actually like run it then we're going to open up our coverlabels.txt file so we basically just read in all the class names that we have in our coco dataset so we can actually display what are we detecting of our objects later on we might actually run our model so this will just load in all the labels from the cocoa data set then we can set up a score threshold for our confidence score so if we are above this threshold here we're actually confident that this is a good detection and then we're going to say like this isn't detection if our detection or like a confidence score of our detection is lower than the threshold here then we will just like occlude that um that detection or we'll just like will not take care of that detection uh so you can play around with this threshold here you can like lower it a bit more if you don't get any adjectives of all or if you get too many like fault positive you can also like increase this threshold value here as well then we're just going to have some booleans here for like drawing like a bounding box in 2d and also if you want to draw the name of the detection that we're actually doing first of all we need to get our camera matrix if you want to get more exact results and if you're creating on your own data set you should calibrate your camera by yourself and also get the camera matrix for your own camera to get better results but in this video here we're just going to go with the examples uh where we're going to use the line mod camera matrix so the intrinsic parameters of the camera that that is used for this data set here then we also have the names for the 3d boundary boxes so if you want to draw the 3d bounding boxes we'll also get those for the line mod dataset so in this video here we're just going to draw 2d boxes around our objects and then we're going to do post-estimation of those objects because we need to actually detect uh the items from the line mod data set if we want to do like these 3d boundary boxes and draw them on top of our on top of our detections then we can also have our classes for our 3d boxes so the same as for 2d so if you wanted to display what are what type of optics are we act like detecting in our image frame so we also get the number of classes and then we're just going to build our model and load the weights so here we basically just pass in our functions and so we have the number of classes we have a five value we have this confidence score and then we have the path to our weights and then this model here like this build model and low weights relax like just build the efficient net model that i showed you and then it's going to load the weights into the model so it's basically just creating our neural networks and then loading the weights into our neural networks then we both have our model and we also have the image size so then we can open up our webcam here with the video capture we're just going to take the serif index um because this is the first camera that i've attached to my computer or like in my computer then we have our webcam open we're just printing out that we're starting the inference then we have a while loop running like as long as we're not terminating our program by hitting q or some other different kind of like values so here if you hit q we'll actually just terminate our program then we can go in load in an image from a webcam so we just have webcam.read we will then read in our image frame from our webcam we will store it in this image variable and then we can also check or we have this boolean god image so if we if we got the image here we will just continue and if not we're just going to like terminate our program as well because then we can't read in any frames from our webcam so here we're just going to make a copy of our image so we have our original image that we can display things on later on because the first image here or like the image that we read in from our webcam we're going to pass that into our pre-process method so this is basically just going to pre-process our image it's going to resize it with the image image size here it's going to use the camera's intrinsic grammars and also the translation scale norm so then we're just pre-processing our image we data back to scale and also the put list here or like the input list here so this will be the input list that we're going to pass through our new network or like this model that we created up here with the build model and load weights method then we can actually just do our prediction so we read in an image oh first of all we build a model we uh we actually load in the weight to our model then we read in an image we pre-process our image and then we can basically just pass that image that pre-processed image through our model then at the end of the model we will get an output then we can do post-processing of that output draw a different kind of information show the confidence score show the labels of the 3d 3d objects that we have actually detected so now we're going to take the prediction step we have our boxes this method here where i can like return the boxes so the boundary boxes the scores so the confidence score of how confident are we that we actually detect the correct um objects we also have the labels of those objects we have the rotation and also the translations so this will be the pose estimation of each of individual objects that we detect in the scene and we get these results by just calling model.predict on batch and then we just throw in our input list that we have from our pre-processed image then we can actually like do some post processing again we're just going to have some pro processing we're just going to pass in our outputs from our new network and then we just post process them and then we get the results back here at the end again then we actually have everything we can just draw the detection so we can just call it method here that we that we import in the starters program or like in the start of the video here so we draw the detection we pass in the original image boxes scores labels are also the poses so the rotation and the translation you're gonna just combine that and you'll get a transformation or like this six dimensional update detection or like post-estimation of our objects we also have the cluster boundary boxes our camera matrix uh the label to the name if you want to draw the 2d bounding box and also if you want to draw the name on our image then down here we're just going to have a for loop running through all the labels that we have so here labels 46 here in the coco data set is a cop so you can just choose any label from the cocoa data set you can just google like the cocoa data set and you can get the indexes you can also go in here in the file that we just loaded in in the start so here we have the text file so if we go down to 46 here we can see that we get a cop so we start from the zero index that's why we have 40 46 so we have to go not one number lower but then you can basically just choose between like all of these different kind of indexes you can also detect multiple objects in your scene but if you just want to uh actually just want to detect one specific object you just go in here and take these indexes you can also have multiple indexes where you just have a list running through all these detections and checking if the labels correspond to any of these so we have a lot of different kind of optics that we can detect from this uh coco data set so now here we're just going to detect like if we detect a copy in our image we're just going to print out or like put out the text here that we have actually like detected a cop and then we're also going to put it out the text for our rotation and also for our translation for that object or like for that cop that we have detected in the image so basically we're just showing the 2d boundary box of the object that we detected and then we're going to show the post estimation of that object in the scene then down here we can just display the image with our predictions so we're just going to use in show image with prediction and then we're going to pass into original image we've also put this text up here for our post estimation and then down here we can basically just have our webcam released and so on if we hit q on a keyboard and that's basically everything that we need to go through here so this is our main function and then down here at the bottom we just have these like get line mod camera matrix so we basically just have our camera cameras intrinsic parameters for for a camera you can specify your own camera parameters here if you want to get more precise results we can also get the line mark 3d boundary boxes so if you want to detect any of these here we can actually go in and display the bounding boxes on top of on top of our detected objects instead of only the 2d bounding boxes and these are just like all these different kind of functions also build model and load the weights so basically here is just creating the efficient efficient net model so we have efficient poster and then we just load the weights and so on but now we're going to run the program and see the results and how it actually works here with our live webcam so now we actually run the program here we can see that we're successfully opening up like these different kind of like cuda modules that we need so here we can see that we're using a tpu here we see the information about our gpu so i have an nvidia gtx 1060 six gigabyte of memory so here we're using a gpu we can see that we have loaded the weights this is done and then we're starting the interference then we're opening up our webcam and so on so this is the result that we get so now we're just detecting this cup here in the image frame as i showed you in the code then we can see we actually draw the cup so the the name of our class that we're detecting we're drawing the 2d bounding box and then we're also displaying the pose of our object that we act like estimating so this is a six dimensional update detection so here we have our rotation and then we also have our translation over here to the right then we can actually see if i'm moving if i just hold it pretty stable here even though the powering box is actually flickering a bit we still see we get values around like 0.8 here 0.8 here and point four uh forty uh point eight here point f around like 30 to forty here if we're just holding our camera still if i'm starting to move the camera here we can see that we actually like get some changes in the values we also get some changes in the in the translation over here uh to the right so this is basically just estimating where is the uptake with respect to the camera so if you want to do it in something like kind of like global scene you will have to calibrate your camera you need to know like where is your camera rope inspector your like global scene and then you can actually like estimate where is that uh optic with the respect to the camera and then you can transform that from the camera to the world scene and then you can actually detect where is the optic in the in like the whole like world or like in the optics uh your defined reference frame here we see we move further back here we can change the direction we can also see it detect some other different kind of a monitor up here at the top but we're only interested in the cup i can also move it in the other direction and we can see that it actually changes to positive values for our cup so sometimes we lose track of it we can play around with the threshold but now we can see that we're detecting a cop we're also doing the post estimation so this can be used for a lot of different things this acts like a really cool model that you can just run you can run it on all these different kind of objects that i showed you is actually like a really good post estimate that we get just think of like we just need to we just need to open up this code here we just need to like take in an image pre-process it pass it through the model with some pre-trained weights and then we just pre-process or like post-process the output we just display those outputs on the image here as you can see you can choose any other objects you can just pass it in with the index here in my code and run it on your own computer and then you basically have a 3d object detector that is also doing post estimation of that object in the same time you can detect multiple objects in the scene and you can do post estimation like simultaneously on all those optics so thank you guys for here i'm going to scroll up on the publication under the video it just will help me and youtube channel out in a massive way so in the future i'm definitely going to do more videos about this efficient pose uh model how we can use it for all the different kind of optics how we can train our own data set play around with it make it more specific do more modifications to the code that they've provided for doing like live inference on a webcam also how we can draw the bounding boxes or like the 3d bounding boxes on the optics because that's also really cool and then both just displayed the 3d bounding boxes coordinate system off the post estimation and so on so i'm really looking forward to that if in the meantime you're interested in computer vision deep learning and so on i also have tutorials about that i'll link to one of them up here or else on the scene next where you guys bye for now
Info
Channel: Nicolai Nielsen
Views: 24,797
Rating: undefined out of 5
Keywords: python, opencv, machine learning, artificial intelligence, deep learning, opencv python, python opencv, computer vision, opencv tutorial, computer vision tutorial, 3d object detection, 3d object detection opencv, 3d object opencv, 3d pose estimation opencv, object detection opencv, 3d object detection mediapipe, 3d object detection deep learning, pose estimation opencv, pose estimation deep learning, 6d object pose estimation, estimate pose of objectt, pose estimation
Id: R7zWFy7JmXc
Channel Id: undefined
Length: 18min 44sec (1124 seconds)
Published: Thu Jul 21 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.