Object Detection OpenCV Python | Easy and Fast (2020)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone welcome to my channel in this video we will create an object detector that has a good balance between speed and accuracy we will be able to run it in real time while detecting multiple common objects and the best part is that it does not require any third-party libraries to run other than opencv so this means that we will be able to run it with only a few lines of code so let's get started so here i am in the pie charm environment and the first thing i have done is i have created this object detector uh project and inside that we have a main.pi file now we are going to write all of our code in this main.pi file now before we start we are going to import a few things so the first one is basically our image and then we have the cocoa names let's copy these so here we have the coco names so these are the what do you call classes that we can detect so we have person bicycle car so if you want to know more about this data you can check the coco data set this is very common and very popular so and this is our image lena.png so we will detect the face on this and then once we do that we are going to run it with our webcam and what else do we need we need the weights and configuration file so basically what we are using is the mobile ssd and we are going to import its files now you can see here we have uh these are the latest files you can see that uh it shows uh okay this one is the old one it's 2017. i will delete that where is the other one uh okay this is the actual [Music] weights [Music] did it copy yes okay so these are the weights and uh of course when you open this it will be gibberish so just close it and then we have the architecture or the configuration so we have all the information over here and we can close that as well so these have a little bit of um a lengthy name but i guess it's fine so this is uh the latest update which is version 3 and you can see it was in 2020 so in the beginning of 2020 this was released so these are basically available on the opencv documentation so i have downloaded it from there so i will put a link to these files in the description so you can download them as well so uh the reason we are using the mobilenet ssd is because it is one of the best methods right now which has a good balance between accuracy and uh speed so we will be able to run with our cpu uh almost real time and it will be able to detect a lot of uh objects with a good amount of accuracy now with yolo we have to use gpu otherwise it's too slow and if we go for the yellow tiny it's not that good so if you are trying to detect some objects uh some common objects it's not that good uh if you use yellow tiny so mobile net ssd can be used with raspberry pi and uh jetson nano as well because it is lightweight and you can get a decent amount of accuracy so this is the reason that we have chosen this and this is uh the latest update so i'm hoping that it will be better than before so let's get started with that and now that we have all the files that we need we will go to files settings and then we will download i will go to the interpreter and we will install opencv so this is the only library that we need so we will write here opencv python by turn and there you go so we will install this okay so now that the installation is done we will close it and over here we are going to write import cv2 so the first thing we need to do is we need to import our image so we are going to say that our image is equals to cv2 dot im read and we want to read uh linup so lina dot png that is correct and then we can simply display this so cb2 dot i am show let's say output and then we will say that we want to display our image and then we can write cb2 dot weight key c2 dot weight key and let's put a zero okay so let's right click and run this and there we have it so now we can see that we have our image now okay i have copied the wrong image actually it has this anyways it shouldn't uh matter that much okay so once we are done with that we are going to first import our names of the coco dataset so the thing is that these are a lot of names so you can see here we have a total of 91 names so i'm going to import it automatically rather than importing or writing it down manually so you can do this so you can actually write for example class names and you can write for example we have person then we have car and then we have uh like this car etc etc right so but this is a bad way to do it because we have 90 different classes and uh it's not it's not going to work it will be too lengthy to do so what we will do is we will simply import it and to do that we are going to say that our uh file basically our class file is our coco dot names and then we are going to use with open to actually open our file and we will say that class file and we want to read it let me write it like this and then we are going to say as f then we are going to write our class names which is this empty matrix or array and then we are going to write that f dot read and we will write dot strip or strip and then we are going to strip it based on a new line and then we are going to write uh that we want to split it as well uh again based on a new line so this basically what it does is it puts all of this into class names and let me let me print it out so you can see so prints class names and let's run that and there you go so now you can see we have person bicycle car motorcycle so all of them in one single list okay so that is good and now what we will do is we will import our files so the first thing we have to import is our configuration file so we will write here the path because the path is a little bit lengthy so we are going to write it like this so we will write config path is equals to we will write down the name of this so what we can do is we can double click this we can right click and we can write rename and we can copy all of this instead of writing it down i know it's lazy but you know what sometimes it's good to be lazy and then we will write the weights so waits but and we will write here again we can do the same thing we can double click that right click rename and we will copy all of this and we will paste this okay so both of them are now good to go now we can simply create our model so the good thing about opencv is that it already provides us with a function that actually uh does all the processing for us and all we have to do is we have to input our configuration path and weight spot and that's it so if you have seen my yolo video it was a little bit different where after passing what you call our image to our net we had to apply some techniques to actually extract the bounding box and everything but in this case it's very simple all you have to do is you have to pass the image and it will do everything for you and at the end you will get a bounding box and you will get the names of the ids actually you will not get the names you will get the ids and from ids we can get the names of the objects detected so here we are going to write cv2 dot uh what do you write dnn underscore detect model detection model yes so we are going to give in our weights path and then we will give in our configuration path so that is good and then we have a few parameters that are set by default and i'm not going to play around with those uh these are the configurations that i found on the documentation and we are going to use the same ones so i will just write them down so the first one we have the input size so in puts where is the size yeah input size and that should be 320 by 320 then we have net dot set input scale input scale and we are going to write 1.0 divided by 127.5 let's give a little bit space and then we have what do we have um the mean so net dot sets input and we are going to write the mean mean and then we have the values as 127.5 and then 127.5 and then 127.5 so again you don't have to worry about all these values just follow along and think of this as something that is required to run so later on when we are doing another tutorial where we are using our own data to train our model then we will look into the details and i will explain everything what this means so here we are just going to write input swap rb and we will write this as true and the the idea behind this uh tutorial is that we should be able to get up and running uh object detector as fast as possible uh without uh going into too much installations and too much formalities so if you want to use it in a robot if you want to use it in a self-driving car you should be able to just uh use this small amount of code and just plug it in and you should have object detection ready for you so this is the main idea behind this tutorial so again uh where were we so we have done all of this now what we need to do is we need to send our image to our model and then it will give us the predictions so we will write here class ids and then we have the configuration the confidence and then we have the bounding box so we will write here net dot detect and we want to detect on this image one thing we have to define is the uh confidence threshold so at what point do we detect it as an actual up so if it's sure that it's 50 percent an object then we can say okay that is good enough for us if it's lower than uh 50 then we will say that okay you you should ignore that so here we are simply getting our bounding box from this bounding box information we are going to create the rectangle on our objects and we can also write the name based on our class ids so we will see how we can do that so that is good so far what should we do next okay let's uh should we print this out let's print this out so let's print class id and let's print the bounding box so let's see what happens and there we have it so now it's giving us the class as id number one and it is giving us this as the bounding box so we have four values we can use these values to create the rectangle but here you can see this is the interesting part so it is giving that this is the first one and if we go to our names now this is indeed the first one which is person but the thing is that um if we look at our class names uh it starts from zero not one so when we are referring to it we will have to subtract one from this value but that's not a big thing we can do that and so let's do this we are going to first we are going to write four class ids and then for confidence confidence actually this is class id we are talking about one specific id so however many uh ids we have found we are going to loop through them and now the thing is that we have three different uh variables or information that we want to loop through so we don't want to put three different for loops we want to put one just just one for loop and we want to get all the information uh within that loop so to do that we can use the zip uh function so here we have confidence and then we are going to write box then we will say in so normally what we do is we say for let's say class id id in class ids right so this is how you would normally write now this is referring to one so uh one list but now we have three different lists so we have to use zip so we will write here zip and then we will write here class ids and we have to flatten it to use flatten and then we have the confidence dot again flatten and then we have at the end the bounding box so this is our for loop and i can remove that and then we are going to simply create a rectangle so we will write here cv2 dot rectangle where is it rectangle and then we will write image and then we want to send in our bounding box and we just define the color of the bounding box so color is basically what is the color let's put the color as zero two five five and a zero so that will give us green uh yeah i would give a screen so let's run this and there we have it excellent but it's a little too thin so let's just increase the thickness uh is it the next parameter yes it is so we will write let's say three [Music] what happened why is it another wait what is the next parameter its thickness okay let me write it down thickness is equals to two [Music] and there we have it so now we can see that uh lena is being detected properly so what we can do is we can actually write the name as well so here we can write cv2 dots put text we want to put it on our image and then we want to write the name so how do we get the name so we can write here class names and as i mentioned before we need to subtract the value so whatever class id we have we need to subtract one from it and then we will write the bounding box uh not the bounding box the points the origin point of our text so here we will use the box so we want to get the x and y position the origin position and we are going to give um let's bring it down a little bit so inside the box okay so we will say let's say plus 10 and we will say here um this one is let's say plus 30. so it comes down a little bit and then we have to decide on a font so let's say cv2 let me bring it down cb2 dot font let's pick any font it doesn't matter and what else do we have we have scale let's put scale as 2 and then we have the color so color we can put as 2 5 5 or let's keep it green so 0 2 5 5 and 0 and the thickness we can put us to i think the scale will be too big too yeah it's too big anyways but we are getting uh the label as person so the class we are getting properly so let me just change that again and let's see yes so that looks good so we have the person being detected and we have the appropriate bounding box around it excellent so what else can we do actually this looks a little bit bad because i think it should be capital letters so let's write upper here and let's run it again yeah that looks better so we are detecting a person now and we have the bounding box and everything is good so what can we do next um yeah okay so now we will change this so that we can run it with an actual webcam rather than just an image so we will remove this and instead of this we are going to write here cap is equals to video capture sorry cv2 dot video capture and then we will write here the the id of our camera so in this case i will put 0 and most probably it will not work because i use multiple cameras so i will change it then if it doesn't work so most probably it's going to be one actually let's let me just write one because i know it's not gonna work um and then we have to define we can define uh certain parameters on how big our image is so we can write here let's say cap dot set and we have the 3 as the width which is our prop id and we can put it as 640 and then we have the cap dot set we have prop id 4 and we can set it as let's say 480. so again that is not compulsory but you can change that and then which part are we not changing which does not need to be in the loop all of this is just setting right it's initial part so it does not need to be in the loop starting from here this needs to be in the loop so we will write here while while true and then all of this needs to be in the loop and this needs to be one and we need to define our image so the image basically is we will write here success and image is equals to cap dot read and this shall give us our image [Music] and do i need to change anything else well so far it doesn't seem so okay so let's run it and test it out [Music] oh so it showed keyboard and we can see this is 76 what is 76 76 is keyboard yes so it is showing properly but we ran into an error why is that so attribute adder tuple object has no attribute flattened okay so what is happening is that we are getting when we are not getting anything it's unable to process one of the lines which is this over here so what we have to do is we have to put here we have to check before we go in that there is something detected it's not empty so we can do that by writing if the length of any of our lists over here so we can write here class id is not equals to zero so that should work let's run it again and there we have it so now you can see i have my keyboard here and you can see that it's detecting very well then i have my monitor it's saying laptop or tv it's a little bit confused between those two and if i do that here you can see here it's again saying laptop or tv and let's try some other objects so we have the cup and then we have another cup we have what is that toothbrush ah not so accurate but close enough it's still a brush but then we have uh the cell phone yeah we have kind of a laptop written here here we have a mouse that is good let me check my laptop yeah in the laptop it says keyboard and let me show you here so here it says tv maybe laptop now so it's a little bit overlapping but it seems to be good so all of these are being detected quite well and as you can see this is real time almost real-time and i can move this around yeah this i don't expect it to work properly this is a stream deck i wouldn't expect it to know what that is anyways but the good thing is we are able to detect uh again it's not the best accuracy but you are able to detect with a good amount of speed and a good amount of accuracy so the balance is quite good so what can we do next uh now the only thing that we can do here which remains is to actually put our threshold at the top so that if we wanted to change the threshold we can add it up here so here we can write that our threshold value is equals to this and then we can write here that this is our threshold um so we can write here threshold to detect objects okay and we can also write the confidence values as well so if we wanted that we can simply put another text [Music] and we can write it here and the instead of the class name we are going to write the confidence value so we will write here what did we write confidence so we wrote confidence so we will write here confidence and should we bring it down or forward let's bring it forward so let's put 50 and see how it works okay and okay so there's an error okay i think we need to check first what kind of value are we getting in confidence because it doesn't seem to be able to write it down oh it's a oh my bad it's a value i forgot so we have to change it to string uh my bad we have to change that to string and let's run it and there we have it so now it looks really bad because it's overlapping and whatnot okay so what we can do is we can bring it further and we can write here let's say 150 maybe that will work so if i have the mouse here and it's telling me that it's 0.6 percent 0.5 percent sure that it's a mouse [Music] and it's fluctuating quite a bit and here we have the keyboard and okay so the value does not seem to be not good what we can do is we can put 200 here and we can put here that we want to multiply this by 100 and we want to round this off so around and we will write here and where's the round here okay so we're right here too and let's see yeah so now it's more visible so you can see mouse is 67 66 something like that so that is good so this is it for today's video i hope you have learned something new if you like the video give it a thumbs up and i will see you in the next one

Info

Channel: Murtaza's Workshop - Robotics and AI

Views: 170,222

Rating: 4.9188733 out of 5

Keywords: object detection opencv python, object detection opencv, object detection python, fast object detection, opencv python, mobilenet ssd, opencv mobilenet ssd, ssd object detector, object detector mobilenet ssd, ssd mobilenet, deep learning, deep learning opencv, dnn, dnn cv2, cv2 object detection, computer vision object detection

Id: HXDD7-EnGBY

Channel Id: undefined

Length: 29min 5sec (1745 seconds)

Published: Sun Aug 30 2020