Live Object Detection in Python

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] what is going on guys welcome back in today's video we're going to implement an object recognition tool in python so let's get right into it all right so the first thing that we need to do is we need to set up the project structure because for this project we're going to need a couple of external files we're not going to just install some library and write some code we're going to need some external files because the object recognition will be done with a pre-trained model we're not going to train the model ourselves we're not going to test it ourself we're not going to build it ourself we're going to use a model that is already existent uh and for this we're going to go to github.com and we're going to go to the user chu wonkey 305 so i will put a link in the description down below to the repository mobile net ssd and we're going to get a cafe model and a proto txt file from here that we're going to then use in python using opencv to do the object recognition now there are many ways to do that you can use yolo you can use different models and algorithms and procedures we're going to do it like that in this video maybe we're going to do it in another way in another video but in order to get the model file you need to go here to the readme and in network mobile net ssd you can go to download deploy this is how you get the cafe model and the prototxt file is located in this directory here in vog or or voc and what you want is the mobile net ssd deploy proto txt file so you download these two files you put them into your directory that you're working in so you have this main py file then you create if you want to a directory called models and inside of it you have those two files so you have the caffe model and you have the proto txt files in addition to that i also have a bunch of images here you can choose whatever images you like i just have some images of rooms with people or streets because we're going to recognize some objects here so trucks and cars and people and tables and chairs and sofas and so on those are going to be the images you can choose whatever images you like but put them in the same directory besides that i want to mention that of course this code is heavily influenced by this repository here because that's the guy that provided the models here or this is where i got the models and the knowledge from and of course also i want to mention here that i got a fair bit of inspiration from the website pi image search i think that's what it called it's called uh i got some of the knowledge on how to use that model from that site so i want to give credit to them even though they were not my main source and not my only source because i looked at a couple of different repositories and blocks i think it's fair to give credit to them since i uh learned how to use this um this model properly using their blog post okay so besides that we're going to start with opening up cmd and we're going to install the libraries we're going to need today we're going to need numpy so pip install numpy and we're going to need opencv so pip install opencv dash python those are the two libraries we're going to need um and once we have installed them i already have installed them so i'm not going to do this again we are going to import numpy snp and we're going to import cv2 which is opencv even though we install opencv python we import cv2 all right so the first thing is we're going to specify all the paths to the relevant files so we want to know okay where's the image that we're going to do the object recognition on uh where are the models where is the proto txt file uh and the first thing is the image so image path is going to be just let's start with room people dot jpeg uh then the proto txt path is going to be models slash and then mobile net ssd deploy proto txt and then the model path is going to be model slash mobile net blah blah dot cafe model uh and we're also going to already set a parameter because you need to think about it like that when we detect or when the model detects a certain object in an image it has a certain confidence so if it sees a chair it can say okay i'm 99 sure this is a chair classified as a chair maybe it will say i have a 15 confidence that this is going to be a chair we need to set a minimum confidence and if this minimum confidence is not reached by the model it's not gonna do any uh detection so it's not gonna say okay this is a chair if it's not confident enough and a mint confidence that i think works pretty good is 20 so 0.2 um yeah that's basically it next we need to have a classes list and this is something that of course you need to get from the repository you need to know the classes for this particular module here or for this particular model here and those classes are going to be specified in the list so you're going to have a list of classes and those are things like a background those are things like uh airplane and bicycle and so on i'm going to copy that list now so if you want to copy it as well you can go to the repository and look at the individual classes or you can just pause the video and type this code here it's not too big but i don't want to waste any time in a tutorial here typing all these strings it's not really educational so we're going to skip that those are all the classes that we're going to be able to classify so birds cows cats person motorbike uh train tv monitor and so on uh those are the things and now we're going to start with the model um so before we actually before we actually import the model we also need to think a little bit about how are we going to then display later on that we detected an object and what we're going to do is we're just going to say if the model detects this guy as a person it's going to just put a rectangle around him so it's going to say okay this here this area is a person and if we have a bunch of people we want to have the same color of the red angle so let's say all the people are classified as blue for example if we want to have that we want to have a specific color then again if we classify this as a sofa we want it to not be blue but something else for example green or red or something else and in order to have random colors we're going to generate a colors list by saying colors equals and then np.ran and we're going to generate values from 0 to 255 and the size that we're going to generate is going to be um the length of the classes list so we're going to have a color for each class and we're going to generate three values for it so rgb basically from 0 to 255. the problem is that sometimes this random uniform distribution will result in pretty bad choices for colors so we're going to have very similar colors if you want to use a seat or if you want to always get the same results you can use a seat and in order to use a seat you need to say np dot random dot seat and by doing that by specifying a certain value here you're going to always get the same quote-unquote random result it's going to be a determined random result and a seed that worked quite well for me is 5 4 3 2 1 0. this worked quite well got some good colors there um now what we're going to do is we're going to load the neural network into our script we're going to load the pre-trained model we don't need to train it we don't need to do anything with it we only need to load it and then we need to feed some data into it for prediction so we're going to say net equals cv2.dnn.read net from cafe we're going to provide the prototxt path as the first parameter and the model path as a second parameter and all we need to do now is we need to forward an image into that model and then we're going to get the result of the detection or classification you could say and we can then apply this and draw this onto the images so what we need to do is we need to load an image we need to resize this image so that it fits into the neural network because the neural network accepts uh 300 by 300 pixels as far as i know um and then we need to get a prediction then resize the image again and draw the detections onto the image so we're going to first say image equals cv2 dot imread and we're going to read the image path here and then we're going to save we're going to store the current shape so we're going to say height and weight is going to be image dot shape 0 and image dot dot shape 1 like that we're going to save it and then we're going to create a blob object a block object or a blob stands for binary large object this is what a blob stands for and we're going to say blob equals cv2.dnn uh blob from image and we're going to say cv2 dot resize so we're going to fit an image here and we're going to resize it to 300 times 300 there you go uh the second parameter is the scale factor the scale factor um is something that you keep can play around with uh basically what the what what the image is multiplied with we're going to use 0.007 here then we're going to specify the size of the image which is obviously 300 times 300 and then in the end what was the last parameter it's the mean it's something that is subtracted so in this case 130 is a value you can play around with these two values so with this value and with this value you can play around and see if you get better results if you increase or decrease them slightly uh just experiment around with that but now we have this blob object and in order to get the prediction what we do is we say net dot set input and we pass the blob object so we have this blob object we set it as the input of the net and now as the next step we say detect it objects is going to be net dot feed not feed net dot forward so we set the input we forward and then we get a result here as a return value so technically speaking we're now done with the object detection but of course we need to visualize it and as a first step we're going to just take a look at this thing here that was returned by the forward function these detected objects and we're just going to print detected objects and i'm going to say 0 0 0 i'm going to explain that in a second here when i run this here we're going to see that we have a bunch of values here and how this works is that we have a bunch of different objects that were detected in the image and this is the first object this is the data for the first object that was detected in the image and if i want to change this to the second object that was detected i just changed this third number here so this now would be the second object which was detected as you can see here i can change this to provided that we have enough objects of course i can change this to 15. i'm not sure if we have 15 objects let's see no doesn't seem like that so let's say eight the eighth uh or actually the ninth object was attacked that you can see some other values and what we have here is this thing is the class index so 18 would be the 18th element of this list here or maybe the 19th element depending on how it's indexed so this is the class this is the actual thing that our model classified or detected and here we have some coordinates so or actually this here is the confidence and those are the coordinates so this is the class this is the confidence this is the upper left x coordinate this is the upper left y coordinate and this is the lower right x coordinate and the lower right y coordinate and how this works is that all those values are normalized so we can multiply them by the image height and by the image width and we're going to get the actual uh coordinates of these pixels and this is how we're going to actually draw these rectangles around the objects so let's get into coding we're going to say 4i in range and we're going to see 4i in range detected objects.shape 2. so depending on how many objects we have detected we're going to say confidence for this particular detection confidence equals int of detected objects now zero 0 i 1 this is how we get the confidence no this is the class index sorry this is 2 is the confidence one is the class index so we get the confidence like that and what we do then is we say okay if the confidence is larger than the minimum confidence we need for detection then we're going to draw a rectangle and put a text there so we're going to say okay the class index that was recognized here is going to be int detected objects and i think we can actually also specify like that so 0 comma 0 i 1 i think this should work as well so this basically is the same as this of course here we have a two in here we have a one but i think the way of writing it is also fine like that um so class index is that then we're going to get the coordinates we're going to say upper left x is going to be int detected objects um and we're going to say 0 0 i 3 important we need to multiply this with the width because of course we have this normalized value for x and we want to multiply it with what we had here oh by the way i see that i used weight here we want to have this as width obviously and we're going to multiply that by that value so going to copy that and we're going to say upper left y is going to be index 4 times height like that and then we're going to say lower right x is going to be five times width and then lower right y is going to be six times height so we then have the exact coordinates of the corners and you need to think about it like that if this is the upper left and this is the lower right i know that all this is the person uh this is how we're going to plot this but first we're going to craft a prediction text so we're going to say prediction text is going to be an f string and we're just going to say classes class index so yeah actually this is um if we have the index 18 this means that this is the class 19. so the 19th class but it's the index 18 and we start at zero so that was what i wanted to say before and we start here with the class index uh the class of the class index so if we have 18 it's going to go to index 18 and give us that uh give us that string here and we're going to have that in the text and as a second thing we also want to know the confidence how sure are you that this is whatever it is or whatever you think it is so we're going to say confidence we're going to format that as uh two decimal places a float with two decimal places so we have the prediction text and we now only need to draw the rectangle and put the text there so we're going to say cv2 dot rectangle and what we pass here is of course the image itself then a tuple of upper left x and upper left y as a second thing the lower right x and the lower right eye uh y not i uh and then of course the color that we have so colors class index hopefully well randomized and then three and finally we say cv2 put text and i have a bunch of parameters here i just need to look at them because parameters are always confusing me never learn parameters never memorize parameters always look up the documentations or your sample code so here we have image again we have prediction text that we just crafted and then we have the position so we're going to say upper left x and then we're going to say upper left y minus 15 if this is for the text if the text fits in we're just going to put it there otherwise we're going to put it either above or below so we're going to say upper left y minus 15 if upper left y itself has enough space to put that there so if it's above 30 else we're just going to say upper left y plus 15. right so like that and then we're going to also pass the font so cv2 maybe we should do a line break in here somewhere let's just see they go and let's make one more cv2 dot font come on font and we're going to use the simplex where is it this simplex font here 0.6 here and as a color again class index and two by the way all those parameters are just designed so they're not not important for the actual detection process you can play around with them you can change them this is just design stuff and once we have that we're done and we only need to show the image so we're going to say cv2 dot in show some title i don't know some detected objects or something and then the image and cv2 dot wait key 0 and maybe we should also say cv2 destroy all windows there you go so let's see if we didn't make any mistakes and if it works there you go uh but we don't have any predictions why is that did we have did we forget something some scaling or something like that let me just see um maybe let's turn down the confidence maybe that's the issue but i don't think so i think we have some other problem here okay i'm going to take a look at that and then come back to you all right it wasn't too much of a big deal we have two problems first of all we have to pass a tuple here so in a put text function when we pass upper left x in this uh statement here with the if and else we need to put that into a tuple so we need to start here and end it here because the same way that we put it here in a tuple we need to put it here in a tuple and second of all we have the problem that we're using in here even though we shouldn't be using in here because for the confidence of course we're never going to get anything um other than zero point something so we're always going to get zero if we go within so that was not a very intelligent choice uh we need to remove this in here this was a mistake and we need to put this into a tuple and then it works because then i can just go with run main and you can see that we get the classifications here you can see 50 that this is a sofa this is not quite right uh it says this is a chair i mean yeah it could be a chair or sofa uh this is a person this is a person this is a person this is a person uh what do we have else here we have a dining table this is also quite right let's change the image to just room.jpg which is another image of a room with people you can see person person person dining table chair and chair so this is fine now let's go to the street and we're going to see that uh what is it oh jpeg.jpg no just street jpeg and then you can see okay we have car car car car car person person person person person uh now i don't know if we have something like truck doesn't seem like we have something like truck maybe bus but it didn't recognize a bus because there is no bus do we have any bicycles here no i don't see any so this is quite uh quite good as you can see again if you change the seat you're going to get different colors so let's compare again the first image that we have what was it roompeople.jpg and let's look at the colors you can see that we have turquoise for person we have bluefoot share we have green for dining table and pink or purple for sofa if i change the seat we're going to get different values so if i do something like that we're going to get different colors as you can see and if i don't provide any seat at all we're always going to get different colors so one time we're going to get it like that another time if i run this we're going to get a different color so if you want to keep it consistent you provide a seat five for uh what was it five four three two one zero is a pretty good one in my opinion you're always getting uh getting the same colors as you can see here and this is how you implement object detection in python now last but not least i also want to show you how you can apply that life to your camera data so let's go ahead and uh not load a static image like this one let's get rid of that uh and let's make a camera here let's say cap equals cv2 dot video capture i'm going to pass zero which is the first camera i only have one so it's going to pick uh this one uh anyways and then we're going to say while true and what we want to do all the time is we want to get a red and frame or actually let's call it image red and image from cap dot read so we're going to constantly read the camera data and as a result we're going to get the image from the camera so this image that you see on the top right corner at the moment it's going to be the image that we feed into the neural network so we're going to take all this and indent indented and then the process is the same so we get this image here this is the second parameter actually we can name this to just be underscore because we're not going to use it we take this image we do the same process that we did before and then we classify everything and this in show here needs to be inside here as well so we're going to just um indent this here as well let me just see if it's correct every time yeah with each iteration we're just going to show the image and i think that should actually be it uh we're just going to put the weight key in here as well and in the end we're going to destroy all windows and cap dot release so i think that should work now in order to show you that it works i need to turn off this camera real quick so i'm going to disable it and then we're going to do a demonstration all right so i have now disabled the camera for recording and we can use it in python one thing that i changed here is i changed this from zero to five this is important because zero uh will just freeze your image you wanna have it at one two three four five i don't know what uh just pick a number here this determines the fps uh just don't pick zero and once you have that you can just run this and you're going to see that it's going to use the camera data for the object detection um and once it's loaded you should see a camera there you go so you can see here person with 99 confidence chair now this is not a chair back then uh back there uh the problem is of course if we look at the classes here there is not really much that can classify maybe a bottle do i have a ball in here somewhere not really i have honey let's see if that works uh so yeah i mean yeah it classifies it as as a bottle as you can see here i mean it's not a bottle so we cannot really blame it for not classifying it as a bottle uh but it seems to work kind of uh we can classify sheeps and horses i don't have sheeps horses dogs here uh the chair i don't know maybe if i stand up it's going to classify this as a chair or maybe if i turn around a little bit here no however as you can see it works live with camera data as well all right so that's it for today so i hope you enjoyed it hope you learned something if so let me know by hitting the like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free other than that thank you very much for watching see you next video and bye [Music] you

Info

Channel: NeuralNine

Views: 9,769

Rating: 4.978456 out of 5

Keywords: python, object detection, object recognition, live object detection, opencv, python object detection, python object recognition, opencv object detection, opencv object recognition, python opencv, opencv tutorial, python opencv tutorial, computer vision, detect objects, recognize objects, machine learning, neural network, neural networks, deep learning

Id: lE9eZ-FGwoE

Channel Id: undefined

Length: 25min 5sec (1505 seconds)

Published: Sun Aug 08 2021