Python: Real Time Object Detection (Image, Webcam, Video files) with Yolov3 and OpenCV

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] in this video we are going to learn how to perform object detection in Python and to be specific we are going to apply euro object detection with the use of open CV you only look once Yolo is a state-of-the-art real time object detection system Yolo is a deep learning algorithm which came out in May 2016 and quickly became so popular because it is so fast compared with the other deep learning object detections models traditionally recurrent convolution new network applies regions to localize the object to perform object detection which means that the model is applied to multiple regions within an image and then the model compute scores to an image at multiple locations and scales high scores regions of an image are considered as an object is detected on the other hand Yolo use a totally different approach instead of selecting some regions they apply a single neural network to the full image and predicts bouncing boxes and probability for each region these bouncing boxes are weighted by the predicted probabilities and if the probabilities are higher than the first hole that can be set by the users the object is considered as detected since it's only scans an image once to make the predictions as compared to other algorithm which requires multiple scans it is faster in practice and that's why this is called you only look once Yolo the latest versions of Yolo is Yolo version free Yolo worship free use a few tricks to improve the training and increase the performance including multi scale predictions a better best boom classifiers and few more minor techniques this recent version is more powerful than the basics Yolo and also the Yolo version 2 and in this video we are going to apply the Yolo version free Yolo worship free is extremely fast and accurate as shown in the picture in mean fh positions measured at 0.5 Yolo version free is about 4 times faster than the other algorithms also you can easily trade-off between speed and accuracy simply by training changing or setting the size of the models without with training the models again our inputs can be in three forms image file where can free video files and with the use of the printed yellow models it can detect up to 18 objects such as person bicycle car motorbike aeroplane us and etc if you want to know more about that you can refer to the cocoa files for details and we will discuss where we can download the files later before we move on I want to introduce a transfer learning concept which is a very important and interesting benefits of applying deep learning network because where we often we are solving a difference but yet somehow similar problems to take advantage from others work and to speed up our training process we usually can reuse partly or wholly from others picturing the network to accelerate our own training and solve our own problems in deep learning this concept is called the transfer learning this means that we're using the race in one or more layers from a peat ring that neural network model in a media model by keeping the rate and fine-tuning the rate or adapting the rate entirely when training a new model in Yolo we are applying similar concepts we will simply dyno the rate and configurations of Yolo and download the name file which is called cocoa and use the deep learning framework in open Siri that is compatible with your low the advantage of that is that it works without the need to install anything except that we need to install OpenCV and one friendly reminder is that the Russians has to be at least three point four point two so first of all we can wishes PJ ready calm to download the weight and configuration files there are five models that you can select based on your preference say for example if your concerns is about speed you can pick their highest frame per second FPS models that is Yolo we feel tiny but if you want to have a higher accuracy you can pick Yolo we three four one six or Yolo we fee 608 the wait file is the Train the models and which is the core of the algorithm to detect the objects and the configuration files CFG files is the settings of the Yolo algorithm and then we can download the name file from github the name virus contains the name of the objects that the Yolo algorithm can detect in other words it contains the 80 objects names the files contains the neighbors of the classes that the priest rented models can classify and finally we can simply open a terminal to pip install OpenCV Python and then we can put everything inside a folder that contains the program that I'm going to walk you through and we are all set and ready to start now first thing first we need to impress their C with you and also we needs to impostor lamprey SMP so first of all we also need to load the Yoda wait configurations and also the object names so everything is under the same folder as shown in the slide deck that I shown you before open C we provide a functions to load rate and configuration files without the need to convert them so this is very convenient and you don't need to analyze or write your own loading functions and the functions will return to a model objects that we we can use later on for predictions so first we just create a variables call net and then use the OpenCV function cv to dock DNN dot print net and then pass the yolo we free wait for it and also pass the other parameters that is the Yolo we free a figuration file and that's it so now these networks contains the euro we feed weight and also euro we freeze configurations so the next things that we agree that I'm going to do is to extract the objects names from the coca file and put everything into a leads so we let's create a list first that is called clauses and then with open i just opened the cocoa oh named fires and we are going to wheat everything inside this file so I just use the ow as f and with everything inside the classes so we just use the dot with dock splits lies functions and now we just load the let word and also lo all the objects names into these Python program we can pre know the classes to see what's inside these classes leads and so you can see that we just extract the object names and pretty inside these classes that contains the person bicycle motorbike aeroplane and so and so and that should L up about that should L up a up to 80 objects name so after loading the networks as well as the objects objects names so we are ready to load the target image that we want the network to help us to do the object detection so right now we can use we can you also use the open C we function I am way too low the image I already have an image called the image stop JPEG and then we can extract the height and also the width that will be used for the scare lengths for the after D deductions let me show you the image that we have I am sure image I am G and wait ski Oh trolley or in those II take a look on the image first in this image there are free persons on their left hand side and there one persons on the right hand side and there are two computer to look book and there are few plants one cup and also there are two chairs and also there are one bicycle and let's see how many objects that can be detected by with the use of Ludo we free so once we have the image we loaded image and we then need to prepare this image and convert it as an input image that can be fit into the Yolo we free so first thing we need to resize the image in a square for one six by four insects image that fits the size of the Yoda we free and also we need to normalize it by dividing the pixel value by 255 and the values are also intended to be in LGB orders if because but right now the image is a PG PG our ordering so we need to swap it so here we need to prepare the image called e blob so by using the CV to dock tienen a blob image block from image function we first inputs the image the second parameter is the scaling that means that I just that's what I just mentions that we are going to Lama lies it by dividing the pixel by value 255 and then the size of the image is four one six one six and then we are not going to do any mean subtractions so we are going to put a zero zero zero here or they swamp I'll be because we are going to convert the BG our image into a RGB others so we need to assign as a true and we are not going to crop anything so we are going to assign false here so let me show you what we've just done by using the blob fromm image function so for be in bumps here and for n I am G blob in Amer 8e and see we to die in show well you so the seaweed to dock pnm block from image functions will returns a blob which is our input image after the mean subtractions normalizing and also the channels wrapping so at the end used to see their free image that contains the web channel green channel and also the blue channels so in other words that their CV to EE an end block from image functions that is used to create a 40-block that is the format in which their deep learning models accepts as its inputs from the input image and then now we can use it as an input to our models save that in the next model so let's remove these fill cooks for the time beings so as mentioned the our safety model is inside a net and then we can set use the set input by pausing the blob as the input into this network so in order to get all the information we need from the predictions like bonding boxes or predicted classes we also need to get the output layers name and then pass the names into the forward functions which will do a forward pass through of the Petri net models so here we want to get the name first by using the output layers names with the use nets top get um connected out years names unshin and then we can pass it we have the output layer we can pass the name of the name to the net dot forward function so let me repeat so this set input is used to set the input from the block into the network and then this get unconnected out layers names functions is just to get the output layers names and then by passing this output layers name into the fifth into the Nets top forward functions we will get the output from this function so in other words during Nets top forward functions is used to run the forward pass and obtain their outputs at the output layer which we already provide them the layers name so now the results and detections are already however we are not done yet because we want to visualize the result we need to extract the bounding boxes and the confidences and the predicted classes which will be started in two different leases so first we let's initialize the least so we need to create a boxes lease - its pounding boxes and we also lead a confidences lease to store the confidence and we also need the classes ID lease to store the classs ID which represent the predicted classes in order for us to extract the branding boxes confidence and also the predicted classes we need to create two for loops that help us to loop over the latest output so the first full loop is used to extract all the informations from the layers outputs and then for the second for loop that is used to extract the information from each of the outputs for each of the output that is stir detections right here that should contains for hunting box offset one box confidence course and 18 class predictions they mean for each of the detections here there should be 85 parameters and you can consider that their first four elements is the locations of the funding boxes and then the fifth element is the Box confidence that is the confidence course reflects how likely the box contains in objects and how accurate is the boundary box and then also contains the any class of predictions the probability of the of that and then we can assign a schools where we're both too extract or to store all the acting classes predictions that they're starting starting from these six elements six elements to the end and then we want to identify the classes that have the maximum scores or the highest scores in this scores better so we used the numpy argument max to extra their highest scores location and then we will pass this element to identify the maximum value from these from this class which is actually the confidence or you can consider that is the probability so let me repeat so we just create a n erase that is scores let's call the scores that is used to store all the acting classes predictions and then we find out the locations that contains the higher scores and then we extract the higher scores in and then assign into the confidence variable because we want to make sure that their predictions has a confidence that is they that is high enough for us to consider that the object is being detected so here we can if a if loop if if bunches here so if confidence is larger than a certain threshold here I set 0.5 the mean is the probability of other confidence is larger than 0.5 we will start to locate the boxes and the predicted boxes contains in the force first four values of the detections variables that list B X Y W and H and X Y is actually in the center of the op is the center X and also VY coordinates of the object and also the size is actually the width and height that is W and H so here we just extract the value that is the center x equals to integer the action is zero and also the center why that is the actions the second element and then whiff is Steve integer that is the per element and then their height is naturally the of alamin inside the detection remember that we normalize the image from with the use of love from image functions before so right here at least all of the value inside these detections arrays is normalized so in order for us to scale it back we need to use the height and width that we extract from the original image so X we need to I'm said I'm see with the whiff and then why is the height and then whip if the whiff and then right with the heights value because Yolo predicts there without with with the centers of the bounding boxes so we need to extract the upper left corners positions in order for us to present them with the use of open CV so we need to assign in other X X you at last the center makes - the wave / - well equals two integers Center - heights / - in order for us to get the positions of the upper left corner and finally we simpie a plan all the informations to the corresponding leads say for example the boxes stop a pen Astra Quentin value of x by W and inch and then confidence Oh a pen confidence and then class ID got a plan US ID that should be class ID yep you chilao we should have all the informations that we need that includes the bounding boxes confidence and also the predicted causes when we perform object detection it happens that where we often we will have more than one boxes or the same objects so we will use a functions called the non maximum suppressions to only keep their higher scores boxes we need to pass four parameter into the functions that contains all the boxes their corresponding confidence a fir so that we are going to filter box bicycles and a first hole that will be used it in the non maximum suppressions a first of all let me show you how many objects are being detected at first so we have 14 objects are being detected and then we use the functions called the NMS boxes the first thing that we need to the first parameter is the boxes that we have and then the confidence and then we need to set the first hole and this first hole it would be better for us to set their first hole same as the first hole that we set under these confidence and then the last parameter is the loan maximum suppressions so by default we use the zero point four value and then we can and let me show you how many boxes that are redundant and we just we just get wheel off so let's have a quick count on that so we have 1 at 0 1 2 3 4 5 6 7 8 9 and we just get rid of the 11 and 12 and then we have 2 14 so there are two boxes redundant and finally we can and pass all informations and show them into a picture and first we need two asides the phones that we are going to show in their picture say for example we just use the earth Sri playing and then we need to assign a color on each of the object detected so we use we just use a random color for each of them and then their colors should be between 0 to 255 texture calico and then then length the size should be the length of the Oxus that we have that is 14 should at least have that and then because we have free channels so we according to if a free here that should be is one okay so we will create a for loop to loop over all the objects detected so that is for I in indexes up flattened and then we extract the Orlin eight and then the wave height from the axis and we extract the label from the class IP classes and the corresponding classes ID and then we extract the corresponding confidence you and finally we aside the counter for them I mean for these boxes so we create a rectangle what is image cordoning is XY and then the size is x plus w y plus h the color coke is the color and then the thickness of the rectangle is too and I also want to put the pets on this a mage that should have the hustling [Music] and the confidence so the location we can assign the location at X and then Y us 20 and then the phones we just society at the beginning is too and we give a white color and then their thicknesses to so let me quickly go through this all isn't for do so here we just use a for loop to identify each of the objects detected and then we extract the informations back from the axis and then that content which contains the locations and also the size of the rectangle that we need to show in the picture and then on the other hand we just we just extract back the classes IDs however this ID is just a numbers so in order for us to show the classes names we get back to the classes list that we extract from the cocoa file names right here so here that is the classes leaves so we get their strength we use the strength to get back to the to the labors and outside to the labor variables and then we also extract the confidence and then put it into a string and then assign it to the confidence that we are going to show in their picture so we randomly pick a color for these for each of the objects and finally we create a rectangle and we also put the Texas at the upper left left left corner in their image so let's take a look on the results there are totally 12 objects are being detected by Yolo here you can see that Yolo can detect the free persons right here but it missed it a person here and there are 2 chairs are being detected two dining tables are being detectives and also their laptops are being detected and also there are two plants are being detected as well as the club here however what Yolo missed it is the potted plant here and the bicycle here and the person here but over speaking so yolo's performed a very nice drop here so as mentioned before our inputs can be in free forms that include the image file webcam file or the video files so the videos or webcam files are just a series of image files that we are going to fit into this Yolo Network for the object detection so let's see how we can do that there are only few changes that we need to make in order for it to work without webcam and also a video so here we use the see we the video capture to capture either the video or our webcam so say for example here I just captured a video and then I just comments their loading image files so we here we should have our webcam ready sorry it should be the radio ready and then I need to create a while loop and keep most of the coke this while do and from the video we will eat we were with each of the frame and I just named meet us an image it goes to captain wait and at the end we need to release the cap and also we need to assign a weight key here so that the while loop can be braked say for example we use the escape key ooh break D while loop and that's it for the changes let's see where they eat work it works properly you so that's good now we can apply yo loads for the object detection z' for free types of files they include the image files and videos files for webcam Faro fit we just need to change these video files into zero and that should work well so before we end these videos let's quickly go through what we've done within these programs so first of all we use the CV to dock the N end up with nets to with the weights and also the configurations from the Yolo files remember that there are five types of Yolo files and you can select what you want for say for example if you want to have a very very fast Yolo performance you can select the tiny Yolo and if you want the accuracy if you want to have a very high accuracy you can choose the six for all files and then we just use these open C we've witnessed functions too with their network and we also download the cocoa file and then use this with open functions to with the audit classes and put it into the classes list and we just you can either choose the camera or you can either choose the image files and then if you are going to use a video file or webcam then we have to put everything inside a while loop so here we just capture their capture their videos of each of the frame and then each of the frames we just capture the height and width that we are going to use it scale back to the original image size and then we use the platform image functions to create an input that we are going to pass it on to the set input function and then we also need to use the get unconnected output laid out layers names to get the outer layers named in order for us to fit it into the forward functions to get the output layers outputs and then we create a - a taboo for loops in order for us to identify or get the informations from each of the identify that object and where we perform tensions very often it happens that we will have more than one boxes for the same objects so we need to use these NFS box functions which is actually called the non maximum suppressions that will have us to only keep their highest course boxes and finally we will have our last four loops that will help us to pass all the informations and show them on to a picture or on to their videos one minor points here is that if you want to keep the color women and change actually we can put it back outside these while loop and that's it thank you for watching if you have any questions or suggestions about what we covered in this video please feel free to us in the comment section below and I will do my best to answer and if you enjoyed this tutorial you can subscribe my channel simply like the video and it is a great support to share this video with anyone who you think would find them useful and thank you all for watching
Info
Channel: eMaster Class Academy
Views: 84,105
Rating: 4.9862947 out of 5
Keywords:
Id: 1LCb1PVqzeY
Channel Id: undefined
Length: 43min 18sec (2598 seconds)
Published: Thu May 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.