YOLO object detection using Opencv with Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi and welcome to this new video tutorial we're going to see today Yola object detection but first what is Yolo Yolo is a deep learning object detection algorithm which came out in May 2016 and it's so popular because it's really fast in comparison with the other algorithms the difference is that Yolo is able to detect the objects in only one pass while the other algorithms needed to scan the images many times but how does your work and how to setup yellow let's explain this one quickly because there might be some confusion confusion for what I see on the forums and for what experience myself so Yolo is the algorithm and the deep learning algorithm and the algorithm to work needs a framework Yolo came out in 2016 we weave the darknet framework which was built for Yolo so we have darknet built for Yolo that works only on Linux after the darknet came also an adaptation of this algorithm to work with tensorflow which is another deep learning framework with with dark flow you can use Yolo with tensorflow and so it works on Linux Windows and Mac as well so on one side we have dark net on the other side we have dark flow plus tensorflow they has to work together but also in this is recent scenes OpenCV 3.2 OpenCV has his own framework that that is compatible with yellow and that's what we are going to see today today we are going to see the opencv framework so we don't need to install anything for now but we will just type the code to do the so2 detection only using open CV let's start [Music] first we need to download three files Yolo v3 CFG Yolo v3 white and cocoa names and so this is the Yola version 3 that's why p 3 and c FG g is the configuration file while the way it is the trend model to detect the objects and the cocoa is the data set so we detect the object that are in these data sets so let's check this file for example and this is all the objects that we are going to find in the picture using the euro detection so a person bicycle car and so on till the end we have 80 objects of the cocoa dataset so after we have these three files that you can download from the link I put in the video tutorial below somewhere below on the video and we can start typing the code we import civet zoo we import numpy as MP and for the library import is all that you need and let's load the Yolo algorithm lord Yolo so we need to load the network we say net is equal to C with CDN which stands for deep neural network and then read net here we need to use the file that we have downloaded so you all owe v3 dot weights and then your law v3 dot C F G and once we have this we need to load the classes from the file so the classes will look something like this classes is equals to and then from the file we take this person bicycle car so third person bicycle car and so on so let's we simply we need them in array but we load them from the coca file so we do it this way we open and then Coco dot names are to read and then as F so classes will be equals to we open the rail line dot strip for line in F dot read lines so we're putting them simply in turn array and to show you what it looks like I'm going to print it so print classes and I run the script it's empty so probably I did some sticker cake classes this one it was a typo let's run it again and you see all the classes the eighty classes into this array okay now let's still keep going just computing the net object that we created still regarding the algorithms so when you know to define layer names from the net we get the layer names then we need to get the output layers out put layers is equals to then layer names I 0 minus 1 for I in net dot get unconnected and out layers and layer names let's see I it looks like there is some mistake layer your names okay this is layer names not layers names okay in simple words what we are trying to do right now is to get the output layers because that's what we need to get the final results of the objects displayed on the screen so with the output layers we can get the detection of the object let's now going forward okay this is all that we need to load our algorithm now let's let's load the image and then we will do the operations with the image first let me get some image okay I just put an image on on the folder so I'm going to load right now the image loading image so mg is equal to 16 read Rome third dot jpg and let's display this image city in show image and then EMG and then see that destroy all windows and I forgot C dot weight key C to that weight key to keep the image open and let's now display it on the screen this is the image two bigs and we almost don't see it on the screen I'm going to shrink it a bit so it will be mg I'm going to resize the image see that resize I want to do it resized dmg no I'm not giving the specific size but I'm saying 0.4 the width and the height also 0.4 and let's run this one okay this is the room where I work so we're going to detect as many object as possible in this image now what we need to know is that we cannot give this image directly to the algorithm but we need to do some operation with it before giving it to the algorithm so we need to convert it into a blob blob is a way to extract the features from this image and let's do this operation here let's call it detecting objects so first we need to get the blob from the image blob is equal to C to D and n dot blob from image and we want to get a block from EMG then the scale factor I found that it has been used this one three nine so and we need to define the sizes the standard size it's 416 by 416 that's the image as we are going to pass it to the euro algorithm and for the moment let's leave this size and later we can check other options and this is some technical things regarding mean subtractions from each layer zero zero zero true here in this we are going to say that we are inverting the blue with the red because OpenCV works with BGR format so each image has three channels blue green and red while normally we need to use red green and blue so we're going with this same true we're going to change the channel and then hurry up is equal to false we are not going to crop the image we want to detect everything in the image and this is the blob let me quickly show what's inside the block the blob so far be in blob then for EMG in B you'll need to do this I'm going just to this to show you what's inside let's show em gee I miss a mixing mg with we also this one so it's not a good things to use mg LS AMG blob and I am sure in G in enumerate so that I can give to each window a different a different name and an mg blob in here a string of n so that I can show this more windows that will appear on the screen okay okay this is the original image and let's not show this one for a moment then we have 2 1 & 3 and no it's not it's still the original image so my mistake I mean I need to show mg blob here knotti mg mg blog let's run again okay that's what we need and this is the blood we have three images and each one is the blob of each Channel we have the blob for the red for the green and for the blue you don't see much difference as the colors in this image are balanced but with different images you can see you might be able to see that they are different okay we have the blob so the image is now ready to be processed by the yellow algorithm so let's do that we need to pass this blob image into the algorithm into the network so we say net dot set input and then inside we set the blob and now we can get the out out is equals to net dot forward output output layers and this one out out put layers so with this function forward we're going to say that we want to forward this to the end so to the output layer to get the final result and let's quickly print out and let's run the script okay what I can say right now is that this out contains all the information already the object has been detected on the screen only one simple things we have all the information we need to extract them so we need for example to instruct to extract the rectangle the posse first the position of the rectangle let's say if we detected the chair we will extract the top left point of the rectangle the bottom top left border right points the name okay this is a chair so the name from the class and and few other informations that we did we will see one by one so let's go ahead and let's look through this through this out objects that has all the info that we need let's say showing information on the screen so we look through it for out in out and then we loop again for the section in out now from this we need to detect the confidence so confidence it means how confident is the algorithm that the detection was correct for example it can say this is this is a table but I am Not sure so it would be we're low confidence or this is a table and I'm confident it's a table so 100 percent confidence so will detect the confidence in three steps first we need to get the scores is equals to detection and from detection we need to get this value so we pass this index from the score we get the class ID class Heidi MP dark marks scores and the class ID will be the number which associated with the classes will tell us what object is that one class ID and then finally confidence is equal to scores and then class ID okay now we can we can work with the confidence if the confidence is greater than 0.5 so confidence in this case will go from 0 to 1 and let's try right now with 0.5 and later we will adjust this value as we wish to detect the object okay if the confidence is zero point is greater than 0.5 we can say object detected and now let's get the coordinates of this object we have the centre of the objects and also the width and the height so let's get the centre centre x equals to int of the detection so the first value detection 0 / we'd then centre Y is equal to int detection 1 / height and then the width is equals to int detection 2 / width and height is equal to int detection 3 / height and this values must be multiplied for the original height and width of the image shall be clear to you why with this operation if not let me tell you quickly we have the original image but then when we are going to find to detect the objects we are going to convert it a blob and blob will extract 416 by 416 images in this case so we need to keep track of the original size so weed and hide eye width is and channels is equals to Eng dot shape so we have height and width we have Center X center Y so let's quickly just draw a circle C dot circle EMG and then Center let's use Center X and Center Y and then the registers make it 10 fix that of radius and let's make it green so each object detector will have a green circles around and thickness to of thickness and now let's run the script okay we see a few small circles around so one here on the chair so most likely the chair was detected we see it sooo on the computer so might be that the monitor was detecting also computer we have this on the monitor models and here one on some books so maybe some book has been detected so now from the circle which is the center of the objects we need to get the rectangle we know the center and we know also the width on this object so this width right here and hide from the top of the chart to the bottom of the chair so let's using these values the center we width and height let's get top left point and right bottom so this is basic geometry not so hard to do so let's do it rectangle coordinates right here rectangle coordinates we have y is equals to integer of center x- width divided by two so this way we're going to to get the top left Y so this is X naught Y top left X and now also the top left Y so together they make the top left point Center y- height divided by two and with this we can draw the rectangle so see which without rectangle on MG so we have top left point which is x and y and we have a right bottom which L is X plus the width and y plus the height let's make the rectangle green 0 255 and 0 and thickness off two and let's run the script okay now the tection starts look interesting we see share correctly detected the laptop detected motor model detected and one or two books detected not so great result yet but at least we have some interesting information so let's go let's go further let me also delete this part regarding the Bob representation which you don't need and now let's start again the script and let's see what else we can do well for sure we need to know ok the object was detected but what object is this so we need to get the name of the object and we also need to organize the into an array so that we can extract them later all together so let's do this operation now we will put all these values inside an array so let's first create your race here just below above we create boxes and it's an empty array confidences also empty array and class Eddy's class IDs and also this one empty array so now we can add them just right here at the end of the loop boxes dot append and into the box we we want to put all the rectangle areas so X Y width and height and confidence confidence is dot obtained and we want to plan the food how confident was the detection of that object so we can show a percentage on the screen if we want to confidence and class ideas dot append so that we know the name of the object that we extract detected class ID okay now that we put it everything correctly inside the array so we have the class the confidence of each object and the box of the rectangle of the position of each object so now let's loop through them the number of the boxes array with will tell us how many boxes we detect so we can print land boxes let's wait a couple of seconds for the execution of the code we have 6 elements detected so like you can say objects detected lang boxes so let's say number object it's quite a long number object X's equals the length of the boxes and then we look through it so for I in range of the length of the boxes the length of the boxes and so we are is the index let's get now the the coordinates of the rectangle were to show the box so X Y width and height is equals to boxes I and then label so the name of the objects it will be equal so from classes we need to take class ideas class IDs and also for the specific index the specific object that we are looking through and now so I and then let's let's first print this label just to see to make sure that everything is correct Oh Katie Vermont or my laptop share the detection looks okay so at least these elements are all on the screen so it's already a good sign at least and let's make sure that these are string because when we show them on the screen we need to be sure that they are string and now let's show the rectangle CV to that rectangle on the in G X Y and then X plus weave and y plus height the color for the moment we're going to make again every object on the screen Glee green and thickness so at this time let's also put the text on the screen so ciudad put text we want to put the text on the image the title of the text will be label so what's the object and now the position let's say X it's okay and y plus 30 so y plus first so the text will not be on the same line where were joining the rectangle we need to define the font the size of the text the color let's make it black for the moment 0 0 0 and the thickness of the text let's make it 3 let's define the font font is equals to C 2 dot font let's choose some font here for inertia plane and now I can run the script okay it's not that clear that nice but we can see that here it's written share here TV monitor here also laptop laptop 2 times as monitor and here book what you might have noticed is that we have laptop and T boom alter two times so we are going to introduce right now a new function which is called non-mac suppression which has the purpose to remove the double boxes so it has a threshold to remove let's call like the noise if this is if a box is unsigned another like this one following some threshold it's most likely that this is only one object so let's now apply this function right now and we will do that here I delete number object detected we don't care to have that one and then indexes is equals to C to D and n dot non marks suppression boxes and then we need to pass boxes the confidences and then we have to threshold one is the score threshold and the other one and M as threshold let's start with the standard values and data we will see it's necessary to change them or not so those some improvement okay what you see that here we get only indexes so we print indexes in Nexus one two three as you know we have six objects while here we have we have only four indexes we have objects 0 1 2 3 4 5 and here we have only 1 2 3 5 it means that 0 & 4 were mostly likely the second laptop detected and the second TV model so what this function is saying is you need to take only these indexes so it will look like this if if let's do it right here if the index of the boxes is inside if I in indexes then in that case we can show the box otherwise not let's take a few seconds and here is the detection nice ok let's finish quickly what we are going to do right now is to have of course better text for each object but also let's increase let's let's assign a specific color for each class so all the chairs will have for example green will in color all the laptops will be blue all the TV monitors will be red and so on so using numpy we create the beginning when we create the classes will also create the colors just right just here colors is equal to MP dot random dot Univ warm from 0 to 20 55 as you know each color each channel green blue and red go from 0 to 255 so we put random color in each channel and the size is equal to the length of the classes so we generate as many color as we have classes as we have ad classes because we have 80 objects that we can detect we are going to generate 8 colors so length length of the classes so will be aided classes and then 3 because 3 is the number of the channels so a color is made by 3 channels so how much green how much red and how much blue and then once we have the random colors we can extract them so we go after label and we say color is equals to colors and then I and color here and color right here let me increase the size of the text to 3 let's wait some time okay now it's better than before we have the chair we have the laptop TV model and the book it I can say overall that the detection was done quite code even if we could improve still we can improve and get other objects so more books probably the mouse the red ball and maybe something else we can do that by changing the values of the threshold and maybe using a bigger image but we might see that in another video and also we might see that in another video how to apply Yolo in real time I hope that this video was clear if you want to know more about Yolo we video in real time please stay updated with my channel because I will release that video so it's all good luck with the detection
Info
Channel: Pysource
Views: 189,537
Rating: 4.9243832 out of 5
Keywords: yolo, object detection, deep learning, opencv, python, darknet, dnn, cv2.dnn
Id: h56M5iUVgGs
Channel Id: undefined
Length: 36min 56sec (2216 seconds)
Published: Thu Jun 27 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.