YOLO Real time object detection on CPU

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hi welcome to this new video tutorial we're going to see her now how to run Yolo in a live video or from a webcam using OpenCV with python there are many options to use you all especially when we work with the only real time one is with the CPU and the other one with the GPU let me explain quickly the few differences between them and what are the advantages and disadvantages of each one with the CPU there is only one really big advantage especially for a beginner is that we need only open CV and then Yolo will be ready to go without doing any other installation but the bigger disadvantage is that it will be extremely extremely slow so we suggest you to follow this tutorial only if you are a beginner and you never work with Yolo before the second option is to run your own the GPUs of the graphic video processor and there is the main advantage that it would be really fast in comparison with the CPU you can reach really high speed and a real-time detection and then there will be the disadvantages that it will be really hard to install you need to compile many of the libraries yourself and also it depends on the operating system you will have some different challenges so there are many things to keep in mind when you work with the GPU and most likely will we see that in another video for the moment let's keep the let's keep the CPU and let's start [Music] we're going to perform our detection using this script and this is where I explained how to detect object using yellow but in a single image so if you haven't watched my other video about your always highly suggest to go and watch it as we are going to use the same code so you will have a clue of what we are going to do and this was the code and let's run see what was the detection that we did last time and last time I was detecting the objects in this room and we correctly detected the chair the laptop the TV monitor and some of the books let's now understand the difference between and working with an image and working with a video what in OpenCV is almost exactly the same with the only difference that when we work with a video we simply work with more images one after another so what we do right now is we go where we load the images and I cancel this part right here instead we load the capture from the camera so cough is equal to cv to that video capture 0 for the first webcam and 1 for the second webcam and so on if you have more webcams now I'm using my first webcam so I use 0 and then once we have the capture we need to get the frames in real time so we need to put them in a while loop while true and then we can say underscore frame is equals to cap read so what we are doing right now is we load the capture from the camera and then on the while loop we take first frame we execute the code below and then again the loop starts again second frame we do the processing on each frame each time so as we need to work in the while loop we need to put all the code below in the while loop until the CV to dot wait key just press tab to indent all this code right here and now let's adjust the things regarding this code instead of working with EMG were working with the frame and here frame so whatever we see mg we put frame and then here where we draw the rectangle frame and frame and instead of showing mg we show the frame then the silhouette key instead of 0 we need to say 1 the difference is that zero is keeping the frame on hold open without going further while one is saying just wait one millisecond and then after when we seconds the loop will start again from here and we will process the next frame so just let's add something here we also want to detect the key if we press any key so that we can break the loop in real time and we can close the video if we don't want to go further so if the key is equals to 2070 and three sevens dsq on the keyboard we break the loop and remember to release the camera cap dot release so that item will release the camera otherwise it can happen that vitam will keep the camera open even if you are not using it and then see it to destroy all windows and that's pretty much what we have to do to run Y all in real time so what I'm going to do right now is just to run the script and let's see okay this is the detection of Yolo right now in real time let's show some object that I have with me right now I have the phone let's see if it gets the tactless cellphone I have another phone nokia let's see if it can detect the two phones and it's okay let's take some book okay also the charge detected the book not that well and now the mouse gets detected as cellphone and here the book let's see with the glasses also okay motorbike ok another mistake but the first thing that should be clear to you is that the speed of this detection is extremely slow and that's one of the disadvantages working with the CPU what we can do right now is try to optimize as much as C as possible the code to make it faster so let's do that right now the first thing I was printing the indexes we don't need to print them so I will just cancel this one and now what we should do about the detection is that we need to process images and the smaller the images are to process the less the less material is to process for the CPU so it will be faster and so we work with that right now and we have the blob so let's change the size of the blob here is where we change where we prepare the images to pass them on the network where they would be processed by Yolo so the blob now has the size of 2016 by 416 and on pure accepts different sizes one is fine 16 4 and 16 that's the standard one the other one is 320 by 320 which is the smaller one and then also bigger sizes like six hundred eight by six hundred eight and eight others something out of my 812 by 812 what we should know is that the smaller they are the faster they will work but also with less accuracy and also let's do something else I want to attack to compare the speed between them so what we want to do right now is to detect how many frames we are able to process in a second so what we do is we import time so that we can count the seconds and time now is equal to time dot time and also we want to count the frames so frame ID is equal to zero each time we get a new frame we simply add frame ID plus or equals one and then we go below before showing everything on the screen and we process the frames per second so what we do is we calculate how much time has passed so elapsed time is equals to the time run out not the time right now oh yeah the time right now time dot time minus the starting time and so instead of same time now let's say starting time starting time I was starting time so each time we know each time that we show the frame we know how much time has passed and also how many friends we processed so what we do now is we need to divide the frames by the time so that we can know the frames per second FPS so FPS it would be equals to frame ID divided by the last time and now we show this on the screen see if it's not put text we want to put the text on the frame and we will call this fps FPS frames per second and then plus the string and we add fps yes now we need to define the position we'll put this just in the top left of the screen so let's say 10 from the left and 30 from the top font found and then we need to define the size of the text let's say 3 and also the color let's make it black zero zero and zero and the thickness one oh here let's define the font before starting the loop so it will be faster I mean we don't need to define the font on each loop it will be the same always so we can just say here the font is this one and we will always use that let's see if we can choose maybe a better font C 2 dot font let's leave this one hair shape plane and now let's run the script and let's see what we get and most of all if we got some improvement of the speed okay okay this is the friends that we are processing per second as you see 0.6 around around each second we process around it so second almost we process only one frame it still really slow but much faster in comparison that it was before now we are around with 0.7 and you might have a better result if you have a better CPU then I have I have let's say old computer with Intel Core i5 let's now try with some object cell phone I have the same object all before I can find something else I don't know if the headphones are part of the data set and can be detected probably not okay it still it still anyway a decent result in comparison we've before and what we can do now he is tried the same detection in a life in a video from a file so instead right now instead of loading video capture I load the file lets me let me see which fine I have I forgot the name on you although I have okay working mp4 and also if you want to test the same file I will put the file on the tutorial you saw that you can download it working dot mp4 let me increase the size of the script okay working mp4 and let's run the script right now let's see what we can detect hundy okay this is the detection on a video from a file and we are processing still around the same 0.7 frames per second which is terrible speed in in a live video and the problem is that on the video we process all the frames so if a video has 30 frames per second we need more than 30 seconds to see only one second of the real video and this is a big problem and I want to introduce run out to another version of y'all which is tiny yellow in tiny yellow is yellow optimized for CPUs so is optimized for the speed so here instead we have the weights and yollop V version three weights and then the configuration file of viola virtual three and now instead let's load your law version three tiny and your version three tiny configuration file and also you can download this file I will put the link on my video tutorial and just that you can download this file so you need them right here so I have wait yeah I have Yala version three weights and Yolo version three tiny weights and in the same is for the configuration file URL version 3 CFG in Europe version 3 Chinese refugee and Kaku names is the same as the work with the same data sets of these 80 objects that are here so let's now try tangyuan and let's see what speed we get will try tiny yellow on this same video and let's run the script ok first let me increase the size and now run the script okay now it's much much faster we get around 5 frames per second we don't see that much the number here but it's 5.5 let me let me increase this let's see 4 and then thickness let's say 3 also I want let's all FPS only let's round the FPS round fps only through the second two digits after the dot so let's run it right again okay now it's better but we don't see it I will put I need to put this a bit below so let's change the position let's say 15 Stella fairy and now let's write and this is the number of the frames per second that we are processing so around five frames per second but what you can notice is that the detection is much less accurate than with the real yellow so even this truck it's not detected quite well you can see that on some frame the person is not detected at all I want to try again on this we've the webcam and later I will explain how we can do some other improvement so instead of you the capture walking and before let's load 0 which stands for the webcam and let's run again the script okay you can see now the detection is white okay and also it's not really smooth but it's almost detection in real-time we have around six frames per second and you need to keep in mind that I'm also recording the screen so my CPU has been used by the recording software so you might get much faster in this you can get I believe also 10 or if you have me a new computer also 15 frames per second only with the CPU which is a great result and let's check how the detection work is the font is not detected of course it's not as strong as the Euro because tiny yar is the fast version but not as accurate as euro what we can do right now is to try to change in the threshold of the algorithm to see if we can get a better protection or even faster it depends let's see I mean not faster to get it faster you need just strong computer as here is the mr. smallest we can use I believe trying to and is the smallest and of course if if you will use even smaller than this one then the detection world really suck so let's keep this one and let's just change the threshold so we can try changing this one and what happens is that if we increase this threshold we will get less objects but the detection will be more accurate but if we decrease this number we will get more objects but detection would be less accurate so we will get also detection of what the algorithm is not really sure is correct correctly detected and also what I would like to show to add is the confidence of the detection so let's take also the confidence right here confidence in his equals and we need to get the confidence confidences hi and I see that I did some mistake here with the colors because we need to take the color from each class and not for each object so this was a mistake I did and the previous code so I'm just correcting it right now and then let's display the confidence see if it's that confidence I know what confidence it's what put text and we need to put the text on the frame and then confidence or better after the label that it makes more sense to put it just after the label so you will see the name of what detecting and then which confidence Plus confidence and let's of course this must be a string and also we want to round the confidence numbers around and we would have only two numbers at the dot and let's run the script okay now we see also the detection and we see the confidence so this is like almost sure that it's a person so it's good threshold for the confidence here we don't see anything for the focus cellphones 0.7 and another problem of y'all is that when two different objects are really close together the detection can't work correctly that's some disadvantages of the algorithm and with six frames per second in the next video about Yolo I'm not sure if you'd be the next one but the next about viola we will see how to use yellow with the GPU on the nvidia jetsam Nano so keep step dated if you want to see the video about yet so now and just a quick reminder that I'm working on a video course about Raspberry Pi we've open C and also will I will add some extra module about jettison now and then computer visual with deep neural network and artificial intelligence in general so if you want to step dated I will put some sun somewhere a link below in the description where you can sign up we can put your email and I will let you know as soon as as I have any news about the video course for this this is all for the moment enjoying your video detection

Info

Channel: Pysource

Views: 64,280

Rating: undefined out of 5

Keywords: yolo, yolo detection cpu, yolo opencv, opencv tutorial, opencv python, dnn, cv2.dnn, deep neural network

Id: xKK2mkJ-pHU

Channel Id: undefined

Length: 25min 14sec (1514 seconds)

Published: Mon Jul 08 2019