YOLO Object Detection Using OpenCV And Python | Python Projects | Python Training | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everyone this is junit here from edureka and i welcome you all to this session in which i'm gonna talk about object detection using yolo algorithm and opencv so without any further delay let's take a look at today's agenda we'll start this session by understanding what is computer vision and why we need it then we shall discuss both machine learning and deep learning approach for object detection moving ahead i'll be speaking about how opencv can be used to pre-process our images finally we'll discuss about both yolo and cnn algorithms to object detection and see how we can implement it before we begin consider subscribing to our channel and hit the bell icon to stay updated on trending technologies and also if you are looking for online training certification in python check out the link given in the description box below to start off let us know understand what is computer vision and why we need it so what is computer vision well you see computer vision basically deals with anything that humans can see and perceive there are so many tasks that we humans do subconsciously that we hardly think are even worth mentioning however for a computer to learn to perform or even try mimicking such things is very difficult to give you a better understanding of what i'm talking about imagine looking outside a window what do you see you might be sitting in an office building and seeing a traffic outside right so do you ever wonder how you recognize something or someone do you know how you can look at someone and know who they are well you see subconsciously we are identifying objects in any image you see then we try to find what relation exists between the object to identify the scene or a place only then we get an idea of what is happening in an image sometimes we can also look at an incomplete image and use our knowledge and previous experience to determine what is missing from it all of these are the tasks of computer vision speaking about different tasks of computer vision we can perform various tasks like object detection object classification image captioning and image reconstruction i'm sure you might be wondering what does this task do right so let's now discuss each of them in brief object detection is the ability to detect object or identify object in any given image correctly then we have image classification which basically means to identify what class the object belongs to image captioning is nothing but looking at an image and describing what is happening in an image last but not least we have image reconstruction in image reconstruction we basically have the ability to identify what is missing in an image in order to reconstruct it now that we have gone through object detection and gain knowledge of what we can do with it let's now see how it works there are two main approach for object detection that is machine learning approach and a deep learning approach both of these approach are capable of learning and identifying the objects but the execution of them is very different starting off with machine learning approach machine learning is the application of artificial intelligence for making computers learn from what the data is given to them here they can make decisions on their own similar to that of human beings it gives computer the ability to learn and make predictions based on the data and information that is fed to it through the real world interaction and observation machine learning basically is the process of using algorithms to analyze data and then learn to make prediction and determine things based on the given data machine learning methods for object detection are sift then we have support vector machine and then we have wireless john's object detection framework moving on to the next method that is deep learning deep learning which is also referred to as deep structured learning is a class of machine learning algorithm deep learning uses multiple layer approach to extract high level feature from the data that is provided to it deep learning model does not require any feature to be provided manually for classification instead it tries to transform its data into an abstract representation deep learning is also influenced by artificial neural network present in our brain most of the deep learning method implement neural network to achieve the results all the deep learning models require a huge amount of computation power and large volume of labeled data to learn from the features some of the deep learning methods for object detection are rcnn faster rcnn yolo algorithm and the faster rcnn moving on let's speak about open source tool called as opencv opencv is a huge open source library for computer vision machine learning and image processing and now it plays a major role in real-time operation which is very important in today's system by using opencv we can pre-process images videos to identify objects faces or even handwriting of a human being when it is integrated with various libraries such as numpy python is capable of processing the opencv array structure for an analysis as i mentioned earlier we use opencv tool to pre-process our data let me now quickly jump to my code editor and show you how i can use this to manipulate our images all right as you can see here i'm using my google collab as a code editor so let me quickly give a name over here as demo and let me start off by writing our code so first off you might be wondering how can i install my opencv right so it's pretty simple in order to install your opencv all you need to do is pip install open cv python all right and after this all you need to do is shift enter as i'm using google collab here it comes pre-built in in our system but if you're trying to download this on your system you can just try this on your command prompt now in let's import our opencv import cv2 and also a matplotlib import matplotlib as supply port and then let us also import our numpy right okay let me quickly execute this fine and let us now have an image over here let's give a variable image and let's see how we can read our image using opencv right so cv2 dot read image and then we're supposed to pass a url all right so let me now quickly upload our image over here okay you can see i have couple of images over here let me quickly select this image and upload it okay let me copy the path for this and paste it over here all right so the reason i'm using dot over here is because we are in the same directory right so let me execute this part over here okay and let's see the type of this image you will see that this gives us a numpy array right if you're trying to read an image using opencv 2 the image is going to be in the form of bgr wherein it stands for blue green and red but we don't want that okay before i move ahead let me just show you what is the size and shape of this okay so image dot shape and let me hit enter you will see over here we have the width and we have the height and this is the number of channels and the reason why we have number of channels is because it's a colored image right so it's going to be rgnb as we are using opencv 2 to reconstruct our image we have to convert this back okay so let me quickly show you how our image looks like so in order to read our image i'll use plt dot i am show image okay so this is our image now you might be wondering like doesn't open cv have its own built-in method to plot an image but yes opencv has a built-in method to plot an image or to display an image but unfortunately it is not supported by either google collab or jupyter notebook okay and as i mentioned earlier that opencv reads the image in bgr format you can see the color has been inverted over here so let me quickly run this again so in order to change the color right so let me give it as image or let me give it as new image right so it's going to be new image is equal to cv2 dot cvt color okay and this method takes an image over here and then we're supposed to pass which format we want to convert this to so it's going to be cv2 dot color okay and now we have to convert this from bgr to rgb so you can see over here so let me quickly execute this and before that let me plot this image right so plt dot i am sure new image okay then if we are getting an error here let's see why we are getting this error right so before that let me execute this here okay we're getting this error because we're saying demo scaling right so let's see where we are going wrong okay you can see that we are getting an error over here right and the reason why we are getting an error is because we have not placed our comma over here and it's going to be bgr okay so it's going to be blue green red to rgb and let me quickly execute this over here okay let me rerun the image from start okay and let us now see how this image would look like so plt dot i am show okay and then we are going to put our new image over here fine and let's see how it looks like now so as you can see here right there's a huge amount of difference okay so here we have a blue color bus and a red bus in the behind and you can see there's a lot of difference over here it's a yellow bus and a blue bus in the back so the reason why we were getting this is because you know this is actually the original image okay and this is what it looks like every time we pick it up from the cv2 all right so let us now uh see some more preprocessing or the basic operations that we can do so to start off let's see how we can split our images and when i say split our image we're going to split all the different channels so if you can see over here we have the shape over here as we have x y and then we also have three right so after splitting it's going to be one okay so let us now see how we can do that so all right we obviously have three channels right so r g and b which stands for red green and blue and then all we need to do is cv2 dot split and new image all right so let us now quickly execute this but before that let's print the shape okay so print so we are going to have shape of r which is nothing but r over here and similarly we'll do it for g and b right so let me copy this and change the values so this is going to be green and blue green and blue so let me quickly execute this now okay we are getting the entire array here so what we'll do is r dot shape and same thing we'll do over here as b dot shape and even for g it is going to be g dot shape and let me execute this so you will see here right we don't have any more channels but if i want to print this new image it's going to be 720 comma 2 or 1 2 8 0 comma 3 all right so this is all about splitting our image now what to do if you want to merge our image right so it's pretty simple we have cv2 dot merge for that so let's give the same name new image this is going to be cv2 dot merge and we're going to pass our channels and this will be in the form of a tuple so it's going to be r j and b all right so this is how we can merge our image let's see how we can resize the image right so let's see another operation wherein we are going to resize the image okay first off we need a scale right scale is nothing but by what amount we want to resize right so let's give scale is equal to 10 all right and then we also need the width so w which would stand for width and this would be an integer value okay and then we'll have image or it would be new image right so new image dot shape and then as we all know this shape gives us a tuple so we are going to pass the value right so this is going to be the shape present at one okay and then we'll scale this down by our scaling factor that is yes and then divide it by 100 and same way we are going to do it for the height so only thing that's going to change over here is h and then we'll change this by zero and rest everything remains the same so now what we'll do is we'll create a tuple here so we'll give this a dimension dim and this will be having width and height and now in order to reshape what we'll do is first we'll have a variable here resize and this would be nothing but cv2 dot resize and then we're going to pass the original image and then the dimension of the new image all right and then we are going to pass our interpolation which is nothing but an argument over here and then we'll have cv2 dot init area cv2 dot fine so let us now execute this and but before that let's print the size right so resize dot shape okay and let's execute this so as you can see here we have reduced our image size but the number of channels has not changed but the value which was something like 720 and 1080 has been reduced to 72 and 182 right all right so let us now see another operation that we can perform here okay so let's see one more operation that is nothing but rotate operation okay so similarly we obviously need to have the height and weight so it's going to be high and then width okay and now we'll have new image and then we'll take shape and we all know it gives us a tuple right so we'll have all the values from zero to one okay and now we'll have to calculate the center so we'll just give a variable rather than center let's get as c and then we'll have double value w by 2 and then we'll also have height by 2 because that's how we calculate the center right and now let's say we want to rotate this by angle 90 degrees okay so now what we'll do is we'll have we'll declare a variable let's say m and then we'll have cv2 dot get rotation matrix okay and now we are going to pass center value oh that's c over here and then we'll have an angle and at the same time we'll also have the scaling factor that is uh we don't want to reshape our image right we don't want to reduce the image size so we'll give one over here and now we have to rotate right so rotate 90 degrees and this would be equal to cv2 dot wrap all right and now we are going to pass our image that is the new image and then we'll have m over here that is nothing but the object over here which is returning and then obviously height and the width okay so this is going to be the tuple value okay and let's now see how this would look like okay so over here we are having an error c is not defined so let us quickly do that okay rather than see it's going to be 2 over here because we are going to find our height and let me execute this now okay so it has been successfully executed and now in order to see our image right we have rotated this by 90 degrees counterclockwise so let's now just print this and see how it looks like so plt dot i am show fine so as you can see here we have rotated our image counterclockwise by 90 degrees so moving ahead let us now see how convolution neural network work okay so what is convolution neural network cnn or a convolutional neural network is a class of deep learning neural network what i'm trying to say here is think of cnn as a machine learning algorithm that can take in an input image assign importance to an object and then to be able to differentiate between one object and the other cnn works by extracting features from the images any cnn consists of following three things an input layer which is a grayscale image then we have the output layer which is the binary or multi-class labels and then we have hidden layers which contains convolution layer relu and then we also have pooling layers finally we'll be having artificial neural network in order to perform the classification it is very important to understand artificial neural network or a n in order to perform multi-class classification let me quickly move to the canvas and show you how cnn architecture is all right so as i've mentioned earlier when we are dealing with cnn right we will have three things that is nothing but the input image so consider that we have an input image over here and this is a grayscale image so number of channel over here is going to be one let's take image size to be 10 and then we also have image size over here as 10 okay in reality we won't have image size to be 10 all right so this is just for just so that we can understand and now what we'll do is we'll perform convolution by passing a convolution filter which would be of a size three and three this convolution filter will go through each and every part of this image and extract all the important features so for example this convolution layer right so let's give this name over here as convolution layer okay now this convolution layer we can have multiple of them okay we can have like suppose 35 convolution layer and each of these 35 convolution layer are responsible for extracting a very specific features okay so now for example let's take let's think that we have 35 layers over here and now this will undergo convolution right so this is convolution we'll get a new image over here and the new image would look something like and we'll get a new image over here and the new image would look something like this is the actual image right initially it was having single channel after undergoing convolution it will have 35 layers or dimensions i should say so this would be 35 and the size over here will be reduced from 10 to 8. so if you didn't understand this right let me quickly make you understand what's happening over here so what's gonna happen over here is this filters over here right this three cross three filters they will go on each and every part of this image okay so now let's take images and matrix right so let's take obviously this ten cross ten or just for the illustration i'm gonna draw five cross five so this three cross three matrix will fit over here okay then there will be a stride stride is nothing but how this matrix moves and then this would be a second strike so it will go through every three cross three matrix and then it will go throughout the image all right and then we'll have an image over here which will reduce to eight cross eight but then we'll have more number of features that is nothing but 32 features or 35 features over here now as you can see here right we have huge amount of dimensions so this would lead to cursive dimensionality in order to overcome this what we'll do is we'll pass this through a max pool layer and here we are not trying to reduce or perform any kind of action over here all this max pool layer will do this is usually 2 cross 2 right okay this is 2 cross 2 matrix so what this max pool layer will do it will go through each and every layer of this image over here so like let's think that this is the max pool layer right and this will sit over here at this four pixels and like for example let's say this has six seven nine and it's eleven right so these are nothing but the probability values okay and it will choose the maximum value from this part so it's going to be something like it's going to create a pixel and this would be 11. now you might be wondering right like why is it giving 11 value well you see as i mentioned these are the probability values right as 11 is the highest value it indicates that 11 is a place where we have highest probability of that particular feature similarly this will go through all each and every box all right and now the size of an image will reduce okay so let me quickly show you by what so now we are going to pass the max pool layer after this the size of the image will be reduced by half that's going to be 4 cross 4 but number of filters will not change all right now we can pass this through another convolution layer you know i'm going to say all of these steps can be repeated again but once we are done with that right we'll flatten this entire layer and this is going to be a simple artificial neural network and every time what happens over here is we are going to pass this through a multiple layers or i can say as deep layers okay and then we can perform our classification here and another important thing that i forgot to mention over here is in order to increase the linearity right we're going to also have an activation function over here usually i would be using raylu i hope you know what is raylorite so what activation function does is after reaching a certain threshold it will activate that part so what relu does is this is how the graph for reload does and here it ranges from zero and one and now in order to perform the classification in our last output layer right i would be passing an activation function called as softmax because softmax gives me the probability which would range from minus one to plus one all right so this is all about convolution neural network using deep learning we can detect objects either by using rcnn model which stands for region based convolution network or by using yolo method so moving ahead i will be talking about yolo algorithm now you might be wondering why do we have two different families for object detection right well you see there's a lot of difference between euler family and rcnn based approach in the rcnn based approach it focuses mostly on division of an image into parts and then assign propability values to those part and whichever part has a highest probability it's where we consider an object to be present whereas the yellow framework focuses on the entire image as a whole and predicts the bounding boxes and then calculate the class probability to label the boxes the family of yolo framework is very fast as compared to rcnn yellow algorithm has evolved over the years it first started with yolo v1 this model is also called as yolo unified and the reason behind this is that it unifies object detection and classification model together as a single detection network this was first attempt to create network that can detect real time objects very fast yolo only predicts limited amount of bounding boxes to achieve the goal the euro algorithm has improved over the years now we have yellow v2 and the latest version of yolo is yellow v3 you see the yellow v1 framework makes several localization error and yolo v2 improves this by focusing on recall and localization the yellow v2 uses batch normalization anchor boxes high resolution classifiers fine gradient features and multi-level classification and also it uses something called as dark net all these features made yolo v2 better than v1 speaking about dark knit darknet is actually a pre-trained model and here yolo v2 was using dark knit 19 which means it contained 19 convolution layer 5 max pool layer and a soft max layer for object classification the latest model of yolo is yellow v3 this model is a fastest and most accurate object detection model it accurately classifies the object by using logistic classification compared to softmax which was used in yellow v2 this makes us capable of making multi-label classification yellow v3 which also uses dark net 55 as a feature extractor you see over here yellow v3 makes use of dark net 53 which means that there are 53 convolution layer as a result of this it can make more accurate predictions of an object now that we know what is object detection and what is yolo algorithm and how to work with opencv let me quickly move to my code editor and show you an application to detect objects in an image as well as in a video all right let me quickly jump to my code editor so as you can see here i'm in my code editor and as i've mentioned earlier that i would be using google collab okay so first off let me import all the important libraries here so i have imported my cv2 numpy mat.lib and so on okay so let me execute this so let me quickly move to the android tab and show you the official website of our yolo algorithm yolo algorithm webpage ok and this is the official website for our yolo algorithm and as you can see here we have multiple versions we have different different versions over here and then we have yellow v3 and then we have various versions right so in order to use this yolo algorithm we need to download this weights folder and also the configuration file the configuration file is basically a github repository so let me quickly show you that as i've already installed this i won't be reinstalling this on my system but it's pretty similar all you need to do is click on weights and it'll automatically download this on your folder and as you can see i've already installed this weights folder as well as a configuration file all right so let me upload these files from my folder okay so as you can see here we have the coco names coco names is nothing but the classes that have been pre-trained right so we'll have a coco names then we'll have the configuration file as well as the weights and let me quickly open this one thing that i would like to mention over here is that we have a couple of flavors of yolo v3 one over here we have 360 416 608 tiny and spp tiny is the one whose file size is pretty small as i'm using google collab over here i'll be using tiny because i'm supposed to upload this file to this environment over here so the first thing that we are going to do is get these files loaded over here so let's give the name of variable over here as net okay let's give it as network or we can give here as yellow okay and this would be cv2 dot dnn which stands for deep neural network dot read network all right and now here we're gonna pass an argument over here we're gonna pass our weights as well as a configuration all right so let me quickly copy the path for this and paste it over here like this similarly we are going to copy the configuration file path and let me minimize this over here and let me paste it over here all right so let's wait for 2-3 minutes because this is still uploading over here so let's wait for like 2-3 minutes and once that's done we can execute our code over here all right so now that we have uploaded our wait folder let me quickly execute this and now what we'll do is let us now import the classes as i mentioned earlier our cocoa contains the list of names right it has the list of names that our algorithm is capable of detecting and as is a pre-trained model there are a couple of things we can detect cats we can detect dogs horse sheep cow elephant bear and many more let me now quickly import that and put them in the form of a list so i've already imported matplotlib over here so let's create a class cla and this would be an empty list all right and now what we'll do is as it's a file we'll use this file handling with open and we'll give the name of this place or that is a path so let's copy the path over here and then we'll also define the type of operation that we are doing we are going to do read operation as f right and now what we'll do is classes this would be nothing but f dot read and now we'll have as this will give the entire page right we want to split it in lines right so we'll have split lines all right so let me quickly execute this over here okay so now let us see what is there for us in our classes so let me have len and then let me give our classes so it should approximately show around 80 and if you want to see what this classes contain they'll contain the name of the different different objects that we can identify all right so let me change this back to the length and execute this so now that we have our classes and we have loaded our model over here so what we'll do is let's load our image so let's take the same old image let's take this bus image over here and in order to load our image all i'm going to do is cv2 dot imread and pass the path over here so i'll copy the path and just paste it over here and let me change this back to the directory all right and we all know that this would read in bgr format and we have to convert this to a rgb format right so let's quickly get that done okay so we'll give here as new image or let it be like blob okay blob and this would be nothing but cv2 dot deep neural network dot blob from image all right so blob from image and now what we're going to do is we'll pass our image over here and then we'll pass so obviously these values over here would be in the form of integer right and we have to convert that back to float so we'll do the regularization by dividing all the pixels by 255 this will ensure that values are ranging from zero and one all right and now we obviously have to define the shape if the image is pretty huge right so consider that we have this image over here so if this image over here is like say thousand cross thousand right so and over here again it's gonna be thousand cross thousand now we have to rescale this image okay because an algorithm cannot read this huge image and it would take huge amount of computation power okay so what we're going to do is we'll resize this image both on the x axis and y axis size to 320 okay so basically we're trying to reshape this so let me quickly mention this over here okay so this is where we can add our reshape image so it's going to be 320 cross 320 all right and then we'll have to pass the images zero comma zero comma zero and then an argument over here which would say swap is equal to true and the reason why we are using the swap rb the reason is because we all know this is going to read in bgr right and we obviously have to interchange our g and b so we'll have true over here and then finally we'll be passing crop we don't want our image to be crops it's going to be false all right so let us now let me quickly print this over here let me execute this line okay we are having invalid syntax and the reason for this is because i have passed a full stop rather than a comma and let me execute this once again and you can see here that we have successfully run our image so let me quickly run this and let's see the shape and size of this so blob dot shape there's still a numpy array so it's going to be shape okay so we are not getting the shape over here right so blob so it's going to be blob so i put a blob so blob dot shape so as you can see here the weights have been changed if you want to print this on your matplotlib all you need to do is you need to reshape this and it's going to be three channels over here okay so let me quickly show you that as well so this is an optional part to do so to print image so what we're going to do is we'll give an image over here or let's give i okay and what this i will contain is image blob sorry and this would be 0 dot reshape 320 comma 320 comma 1 all right and now we'll just plot it i and if we try to plot the blob directly it's not going to plot the reason is because it's not in the correct format so let me quickly execute this now so if you can see here we have our images right this has been divided into three channels that is r g and b channels and that's the reason why we are getting three images over here so now we have this blob right now we are supposed to set this blob as an input image and in order to do that what we'll do is yolo okay this is the model or module that we have imported from outside so we'll set the image that is set input and then we'll pass blob and let me execute this all right and now we also have to define our output layer so output layer names and this would be dot get all right and we'll have unconnected output layer names okay and now what we'll do is we're going to pass this for our output right so layer output and this is going to be yolo dot forward and we are going to pass just the output layer names all right so let me copy this over here and paste it here so all that is left over here is we need to read the image we need to find where the bounding boxes are and then we need to put the bounding boxes on our image okay so in order to capture our bounding boxes there can be multiple number of boxes in our image right so we'll create a list so box and we'll give here as array so this will be boxes and then we'll have confidence in the sense by what confidence is the value being predicted and this would be an array yet and then we'll have class ids and the class id over here refers to what we have imported right so in order to get this we will be using class ids so we'll run through an image for output in layers that's going to be this one layer output and then we'll have for detection in output okay and now we'll first capture the score and this would be nothing but detection so this would be like score right score is nothing but an array so what this output will give is for every object that has detected right it will give an output over here from 0 to 18 and then the first four box right this is responsible for the position of our boxes so here it gives x y center and other parameters and rest of it it gives me whatever the parameters we have right like we have 80 parameters it gives a confidence or probability of that particular character being in that box okay so we'll calculate the score over here so excluding the first four boxes uh we'll have five over here and then we'll also capture the class ids so class id that is the np dot rmax score okay and let us also get our confidence over here confidence this is nothing but score and then we'll pass the class ids over here okay so if you want me to explain this what is happening over here see we are getting this list right so by detection we are going to get a list okay and what this core id is doing is we're gonna get the place or the placeholder in that array where there is a maximum probability of that image and what this class id will do it will just get the whatever the value is there or the percentage of that probability okay so now moving ahead uh we'll have in order to prevent multiple bounding boxes we'll have confidence all we're trying to do over here is we're going to set a threshold if the confidence is greater than 0.7 okay and then what we'll do is then we'll extract all our features so we'll have center x this would be nothing but we obviously need an integer value over here detection as i mentioned is detection right the first four boxes is responsible for giving us the x y height and width so it's going to be zero times width all right and now similarly we are going to do it for x height and width okay so let me copy this again and paste it here this is for the height and then for the width so let me just change the name to width and then we'll have height all right and then we also need to mark height over here and similarly we're going to do it over here and now finally to find the center value of x we'll have x which is nothing but int center x minus w by 2 all right and similarly we'll do it for y is equal to int center y we're using this right the x and y values in order to find the corners right so we'll similarly find it for y which is nothing but height by 2 all right and now finally all we need to do is we need to append these values to these boxes over here so what we'll do is boxes that is nothing but these layer so it will be box should be yes that shouldn't be an issue dot append will pass a tuple value here of x y width and height similarly we are going to pass the confidence values that is nothing but this tuple confidence so we'll append the values to the list dot append and then all we're going to do is float confidence value so which is present over here okay and finally we need to have class ids right so let me get this as well and then all we are going to do is append and then class id all right so now that we are done with this let me quickly execute this block of code over here and let's see if we get any error or not all right so as you can see here we have successfully executed our code now what we'll do is we'll see how many number of bonding boxes up there and to do that what we'll do is we'll find the list or we'll just print the amount of elements that are present in the list okay so what we'll do is length of boxes okay and this should give us okay two that means whatever image we are passing it is able to detect only two objects over there all right and now what we'll do is we have to obviously add these bonding boxes to our image right so we'll do indices and this will be cv2 dot deep neural network dot nm s boxes all right and now what we'll do is we'll pass the boxes that is nothing but the values over here and then we'll pass the confidence all right and then we'll also pass 0.5 and then 0.4 all right so let me quickly execute this all right so we are getting an error that's because i have to put a comma over here and let me execute this now all right so now what we'll do is we'll obviously have to add this so we have to add the font and confidence to our image right so we'll give font so let me give here so this is nothing but cv2 dot font underscore you can choose any one over here i'm going to use hashtag plain right so we have over here and then we'll pass colors to a font that is nothing but c-o-l-o-r-s this is nothing but np dot random okay we don't want the same color right we want to pass random colors for each of our bounding boxes so we'll give the range 255 and then size this would be equal to the length of the bounding box so size this would be length and then we'll also pass three over here all right perfect so now all that is left is to add the images so okay so let's see what is the issue here all right so the reason why we are getting this is because i have forgot to mention uniform over here so let me quickly mention that dot uniform and let me execute this now perfect now what we'll do is we'll take each and every object over there and we'll add bounding boxes to this and in order to do that we'll use a for loop for i in indexes dot flatten all right and now we have to take our x y coordinates so x comma y comma width and height this would be nothing but boxes and then this is for that particular image right and now let's have a label so here we are basically trying to add label so this is going to be str classes i hope you know from where we are getting these classes we are getting these classes from here we imported this coco names right so we are importing the classes from there all right and then all we are going to do is pass class id and then we'll have a list fine and similarly we are going to do it for confidence so we'll convert this to string and then we obviously don't want it to be floating point number so we'll round it off and confidence and then let's have it by two decimal places over here and now for the color yep so what is happening over here is we are just trying to extract each and every information from the list that we have created above and now we'll all we'll do is we'll add rectangular bonding boxes to do that we have cv2 dot rectangle and then we'll have to define the image uh the coordinates that is x comma y and then we obviously have to define the edges right so it's going to be x plus width then we need y plus width and the color and then we'll have one the one over here refers to what is the size of the rectangular box you want and now let's similar way let's add the text so cv2 dot put text obviously we want them in the image all right and now we have to pass all of these parameters so it's going to be labeled and then we'll give some space and then give some confidence value right and then we'll also pass a couple of parameters like where we want it we want it in the top left corner so it's going to be x and then y plus 20. and then we'll have font this selects which kind of font you want font size and the background so we'll keep a white background over here okay so this is not the background this is what color of text you want so it's going to be 255 comma 255 comma 255 so it's white color and then the size of a bonding box so let me quickly execute this and see what happens whether we get any error or not all right so as you can see we have no errors which are present over here so what we'll do is we'll try saving this model or we can also see it over here so in order to see this all we need to do is plt dot i am show and pass the image name and then we can see the image all right so what's happening over here is it's trying to read the original image right so let me now quickly show you if we have any bounding boxes in our image so as you can see here right there is a thin box which says person with 99 probability but we are unable to see it so what we'll do is i'll just increase the size of this over here and let me rerun this from the beginning all right so let's see what happens okay so as you can see here it's detecting this person with 89 probability right so let's do one thing let's take one more image and see what happens [Music] so let's take this image over here and let me quickly upload this okay so let's copy the file path all right so let me quickly just change this value over here and give this as a root directory so all i have to do is dot and let me execute this from the start right so let me execute this so as we can see here we have couple of images which bounding boxes so in order to look at these bounding boxes clearly let me save this image so that we can zoom in right so in order to save the image all we need to do is cv2 dot i am right all right give the path where you want to save this so let's give image dot jpg and then which image you want to save right so it's going to be img over here and let me quickly execute this all right so if you're getting true over here this means that we have our image over here and let me quickly run this up for us so as you can see here this is detecting a person and as there are too many number of people over here so it's gonna detect a lot of people right so that's why it has given a single bonding box and it has detected a person over here all right as i mentioned earlier using opencv we can take three types of inputs right one is a webcam feed another one is the images another one is a video although the procedure remains the same only thing that you're going to do is you're going to change how you take the input and put all of this in the form of a for loop all right guys with this we have come to the end of our session i hope you enjoyed and learn something new if you have any further queries or doubts please do mention them in the comment box below until next time good bye and take care i hope you have enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest do look out for more videos in our playlist and subscribe to edureka channel to learn more happy learning
Info
Channel: edureka!
Views: 42,892
Rating: undefined out of 5
Keywords: yt:cc=on, yolo object detection tutorial, yolo object detection python, yolo object detection demo, object detection using yolo, yolo object detection from scratch, yolo python, yolo detection model, yolo object detection, object detection yolo, object detection using python, yolo object detection opencv python, object detection python, object detection using opencv python, opencv python projects, python projects, yolo python opencv, edureka python, Edureka
Id: b59xfUZZqJE
Channel Id: undefined
Length: 47min 20sec (2840 seconds)
Published: Mon Mar 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.