Object Detection with YOLO v8 on Mac M1 | Opencv with Python tutorial

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

and you can see how incredible it's now like the speed leveraging the Mac GPU hi welcome to this new video I'm Sergio I'm a completion consultant developer and course instructor I build contribution solutions to help companies improve their process efficiency reliability and scalability today we're going to see the installation of object detection on the Mac M1 we will see how to build write the code from scratch in Python to detect objects on a video and then we will see also how to leverage the graphic card from the M1 device to get a very high performance for the detection so the reason why I'm doing this is first to show you how to install and use object detection on the making one but also how to make this much faster and see how fast the mac and one can work with object action in addition at the end of this video I'm all I'm going also to do some benchmarks so I'm going to compare the Mac M1 with the GPU with the CPU also with the Windows computer either with an Nvidia RTX 3060 and also maybe another graphic card like the GT GTX 1660 TI so let's go let's start first from the installation of the libraries that we need for object detection so we need to open the terminal so we have terminal and then peep install Ultra lyrics and then we press enter I don't go with this because I have this already installed but that's the only installation that you have to do for python to have all the libraries that we need for object detection because ultralytics is going to install a lot of other dependencies like for example opencv which is the library that we use for image processing in general then I created a pro project in pie charm I will not go into details about pycharm you can use any editor that you prefer to write your python code just follow along with everything that I'm writing I'm going to use this file right here we will play with these which is dogs MP4 which is the video file where we're going to attack dogs and of course you can use your own file to detect your own objects and all the codes that I'm writing right here will be available to download in the link that I will put down below in the description so that you can also get access to the same file docs MP4 and the code but of course for this exercise I recommend to not download the code but write the code follow the video and just write them by yourself without just using the code that uh like ready without writing it yourself so let's now go with loading first video with the opencv library so we start writing the code in part CV2 which is the opencv library and was installed together with ultralytics let's now load the video to load the video we need to create a capture object cap equals CV2 dot video capture and then we need to put the path of the video so I have the video in the same folder so the path is just Docs dot MP4 if you have the video on another folder you need to put the full path of the video so docs MP4 and let's now go with uh taking the frames from this video so red frame equals cab.read now this function is going to return Red it's just say if there is a frame it says true otherwise it's false and then the frame that we want to use now let's show the frame on the screen cv2.in show let's call the wind of EMG and then what do we want to display we want to display frame now when we run this code so let's run this one very quickly so the code gets executed it opens a window with the frame and then it closes because the execution of the Python code finishes and then everything stops so we need await key event cv2. wait wait okay to keep everything open on hold until we press a key so we run this one and now we see that everything is working but uh we have only one image and that's all so we press a key and it closes that's because we need to first let me close this one we need to we have a capture object but we are taking only one frame consider it's a video a video is nothing more than a frame after the other we need to get all the frames in a loop so what we will do is we put everything in a while loop wire true and then everything inside so now it takes the frame it shows the frame it's waiting for us to press a key and then it starts again taking the next frame and so on so as you see right now it's loading the frame it's waiting to press a key I press a key it takes another frame and so on so I don't want this to wait for me to to press a key instead we want this to go in real time so instead of CV2 with key 0 we put cv2.8 key one so the frame doesn't it's not on hold anymore but it's just waiting one millisecond and then go into the next frame keep so let's run this again and you see of course it's very very fast but I can't do anything I cannot close it if I press a key nothing is working what do we need to stop it we need a key event so when you press the key this key corresponds to some value if the key is equal to 27 which is yes kill the keyboard we want to break the loop so it stops the video ah this will be enough but there is another thing that I want to show you we have an error right now what is happening it's taking frame after frame that's some point there are no more frames so when there are normal frames frame is empty there is nothing it's known we try to show a frame that doesn't exist so on line 8 we're getting an error and it's clearly explained size is not greater than zero so there is no frame and we're trying to show a frame that doesn't exist in order to avoid these we need to say if there are no more frames so if not red it means there are no more frames we want to break the loop so we can stop the loop we can break for two reasons either there are no more frames or we want to quit so we press a key the 27 which is the ASCII on the keyboard and it quits so if I run this right now I press the S key it stops so let's now do one thing uh another thing it's cap.release it usually doesn't happen like for such short codes but when this is into a bigger program this is essential Capital release because if you are using a video and then you are not using the video anymore still a capture object is holding the video so in case you take the video from other programs it's say like you cannot access the night because python is holding the video Until the code is executing so cut that release to release the video and then destroy all windows to close all any window that might be open from opencv so so far just very basic uh computer vision to load an image and a video let's now go further because the goal of this video is audio detection so now we can use the audio detection code we're going to use ultralytics from ultralytics import because at the moment I'm recording this video it's like the fastest and latest deep learning object detection model with the YOLO version 8. it's very fast very precise very accurate and it can leverage it's using pytorch so it can leverage the graphic card of the Mac M1 which is great so from alternatics we import YOLO now we need to load the object detection model so we're going to say model equals yellow and then here we can put the path of the object detection model if we put just yellow V8 m dot PT it's going to automatically download this so we have five different models the YOLO version 8m which is the medium that we have smaller versions bigger version the smaller is less precise but faster the bigger is more precise but slower will not get into details of this let's stick with the default version you'll be 8m and what can we do with this Yellow Version this is a a YOLO pre-train model that can detect 80 different classes 80 different classes which one we will see later what we can detect let's now write the code to detect the objects once we have the model loaded to detect the objects we need to pass the frame inside the model and the model will tell okay in this image there is bounding boxes of the position of the object this is the object this is the confidence of the detection so results equals model inside the model we pass the frame I'm going now to put CV2 or weight key 0 to keep everything on hold so that I can print something and it doesn't keep printing everything in the loop so results let's now slowly understand what's happening with results so let's print results and let's run the code it takes a while to run audio detection so it's not as fast as when we run just basic opencv first now it's downloading the model because there is no we don't have the model so the model is a pytorch file with the size of around 50 megabytes at least this default medium model then once it it has the model it needs to load the model in on the memory so that's why it takes a while to to start see our output our output is a big array with a lot of information inside and this information regarding the position of the object the class of the object and so on now we need to extract this information so we are now going to do that in order to instruct this information we take from results we take a single result equals so results and then zero and let's now print result and run this again why am I doing this because the code is made in a way that you can pass an array with multiple images at the same time consider we are using only one image at a time we are taking like the first index so that's why I put results and now from results we can access the bounding boxes so let's say bounding boxes equals result Dot b b or X boxes dot X Y X Y should be like this let's now print this one print boxes and let's now run this one so this is to get like the point okay that's what we get uh with this we're getting bounding boxes we have like two points top left so this point up left and right bottom which we can use so X1 and X2 no x x one y One X two y two that we can use to draw the bounding boxes and so we have this now of course we need to optimize this the information is not ready for us to draw it and to put it on the screen because we have 671 point something 267 point something and this is a tensor from pytorch and we can't use this information right away inside opencv so we need some more pre-processing so what we should do is to say to CPU and then we're going to convert everything into a numpy array and P dot array so I'm going to import numpy SNP by SMP numpy array and then data type equals int because we want an integer we don't want float numbers when we're drawing a pixel if we have like 60 667 point something either it's 667 either 668 we don't want float numbers on integers so let's now print again bounding boxes print B boxes and then let's run this again so now the way I'm extracting this call in way it's it requires a lot of preprocessing which normally can be simplified so we can do a class or we can do like some functioning classes that do everything for us for this to have a very clean code later and that's what I do when I build some project I usually have classes and everything well organized the only reason why I'm doing this right now is to so that you can play around with this from scratch and it's a good exercise for you to understand how we extract the information from the original library and so on so now you see we have bounding boxes and we have 661 71 2067 it's also web numpy array we have only integers and that's exactly what we wanted and from this we can now extract what we need so four consider we have multiple bonding boxes we bought just a bounding box so that we can display the detection for big box in bounding boxes let's extract X Y X2 Y2 equals a b box so if we print now X let's just print a couple of them X and Y then y like this I just want to make sure that we are extracting everything correctly like this so we have X and Y you see everything is is extracted correctly so what do we do once we have X and Y X 2 Y 2 we can draw a rectangle because this is the position of each single object that it's detected on the screen so we do cv2.rectangle where do we want to draw the rectangle we want to draw the rectangle on the frame then to draw a rectangle we need two points we need X and Y which is the top left point and X to Y2 which is the right bottom point then we need to decide what color do we want to make this rectangle let's make this rectangle red because it's it's easier to see it's the best color or for this contrast that we have in this video so to make a color we need three values BGR formats so let's say zero of blue zero of Green from 0 to 255 and let's put around 225 of red and then how thick do we want the rectangle let's say two pixels so it's well visible and let's now run this one so let's stop this and let's now run this and we have now as you see the dogs uh we can do this also more in real time now I put CV2 white key zero so everything is on hold now we have object detection already working and this is a first step we can of course add more information we can add what is the class of these objects we can add the confidence score so let's do that let's maybe I'll add at least a class so classes equals MP dot array we use the same so it's a result dot boxes dot class it should be class but I will I'm going to double check to make sure so from the documentation uh CLS answer not class CLS dot CPU and then we use the same data type int so what do we do now we do for class um let's say better object class rbj or let's say CLS for class in zip so we're not going to extract two arrays at the same time bounding boxes and classes and let's now show this on the screen so classes and boxes are two identical sizes array so for each for the first bounding box or the first object we also have like the class so what so what this bounding budget bounding box is and so for the second so they are consequential so class 6 CLS CV2 output text why do we want to put the text on the frame and uh what's the text we want to display the class so we display a string of CLS now where do we want to place that X let's place the text on X and Y on the top left but I don't want the text to overlap the box so I'm going to put it a bit above so let's say minus five and I'm going to draw the text after the rectangle see which dot code text on the frame string of class the position X Y minus five and then we need now to specify uh the the font of the text let's say 0 or we can say you can either put an index or we can say C2 dot font plane uh we have a few fonts in opencv that we can use we don't really care now how the font looks like for this exercise so just go with anything like this and then the size of the text let's say one and then the color of the text let's say let's use the same color 0 0 to 25 and the thickness of the text we say two and let's now run this one so I want just to make sure that there are no typos that uh the put text is correct and there is some problem of course cannot unpack a numpy in 64 objects online 19. okay so I'm extracting bounding box and classes but then first is bounding box and then class so I need to invert this on the box it says so now it should be correct and we can see now there is a number which is 16. so 16 is the class related to the docs however 16 for both of them of course we have only only two dogs so we have the class of dogs now you might be wondering how can we show the class for this we can also show the class I want to to show you how the classes is so usually we have a text file that's associated with the index so 16 so this is the text file where for each line we have a class name this spiritual model can detect 80 different classes from Coco data set and among these 80 different classes we have like person bicycle car motorbike and so on so they are very common objects and then we have the dog which is uh the 17th line so it's considered we start counting from zero it's the class 16. so I will not do this in this video because it will take too long and that's not the purpose of this video but you can load the text file into an array and then you associate so from this array you put the index lot from this array of classes index 16 which will be the dock so that's how you will have the class name also and that's uh that's pretty good we are showing like the the object that we are detecting and we are also getting the class so if you load for example a video where it shows a person it will be class 0 because person is the the first class so I recommend for you to try this with different objects I'm going to put like the class list anyway in the code that you can download from the from the blog posts or everything you will find in the blog post so that you can play with all these 80 different objects that you will see right here for your own project and that's so far for audio detection in general let's now I mean for the code let's now run this in real time because that's the first thing crucial point for me of this video to see how fast is this on Mac M1 and can we make it faster so let's first run this and see how this is performing now so what we see right now is the video which is going very very slow because it's performing object detection with the planets of course the speed is going to decrease in comparison with just loading a video here below we have some benchmarks for how fast this is processing these benchmarks are referred to each single frame so for example we have let's take this Frame It took around 487 milliseconds to be processed so it means that it it's taking half a second to process a single frame this means that we have only an FPS frame per second of two which is really bad let's say it's very very bad to two frames per second but you need to know one thing we are not leveraging yet the GPU from the mac and one so by default it's using the CPU it's not leveraging the Mac Mac GPO so what we need to do is we need to enable the MPS backend from pytorch first let's make sure that the MPS backend is enabled so we can test this by importing Pi torch so import torch and then uh we have torch so it's print torch dot back and Dot MPS dot is available let's now run this one before everything we can do before we load the model so this is going to tell us just true or false okay true that's all we need so the MPS is available by default when you install ultralytics also the MPS for the Mac is installed there is some requirements first of course you need to have like the the Mac M1 or and also you need to have like there is some OS version I'm not sure about which is like the minimal OS version from Mac for for this to work but I will put that from I will put like the official information from pytorch so there is no no mistake on that so we have the MPS available but it's not enable now for the detection what we need to do is we need to tell the ultralytics library that for the processing of this we want to use the MPS device so right here when we're using the model to detect the frame we need to add device and then equals and then between position marks so this is a text we say MPS so this is an option that we use on we can use for different devices so we can choose like CPU or if we are using a computer either windows or Linux with Nvidia graphic cards we can choose also what graphic card do we want to run in case we have multiple graphic cards so let's now run this with the MPS so if you remember on the benchmarks before we were getting around half uh one frame the two frames a second so half seconds to process a frame and you can see how incredible it's now like the speed leveraging the Mac GPU so now we're processing the video pretty much in real time so I'll leave this go for a bit so we have now object detection in real time on the Mac M1 if we check now the benchmarks the speed is more than 10 times faster we were processing around 490 480 milliseconds for one image now for just one image it is taking 42 milliseconds which is incredible I'm I was very impressed by this result when I run this object action first and the making one I was so surprised because it's not so far from the uh the the good Nvidia RTX graphic cards for object detection so this result is very very impressive for such device this is I'm testing this on Mac and one meaning so it's very very impressive and this is it for object detection at Mac M1 you need to run this basic code you put device MPS to leverage the Mac M1 device and then the object detection will run on this this code will be available for downloading below again I recommend you do this by yourself for this exercise it can be improved a lot for example we can add the classes we can add uh different colors for like different classes we can improve like the structure of The Code by creating some functions putting them functions in another file so that the main code is very clean so that's what we should look for a project but for this it's a great start for you to learn object action and play around with the Mac M1 or even other devices because like the process is very similar on Windows and Linux and now let's do some benchmarks so make M1 of around 500 milliseconds let's say for the CPU around 40 milliseconds for the GPU let's now see how this is working on windows with the Nvidia RTX 3060. also I'm not testing the exact same code on Windows in this computer I have two graphic cards an Nvidia RTX 3060 which we see right here and then we I have an NVIDIA GTX 1660 CTI so I'm going to start first with the RTX 3060 so instead of MPS I'm going to use device zero which is the first GPU so remember MPS for the Mac to leverage the Mac GPU CPU to just load from the CPU and then when we have Nvidia devices we put a zero for the first one for the second and so on also will not get into details of installation but the installation for pytorch on Windows and Linux for the Nvidia GPU needs to have another step which we need to install pytorch with Coda and so on in case it's not working if you do the the same test device zero so I'm going now to run this one so let's run so it's same code same exact video so you see python 3.9 Coda Nvidia RTX 3060 and this is the benchmark so we have around 18 16 milliseconds per image I can say that the image like this GPU it's not fully used because the code can be optimized but at the same time so we see that GPU is is around 50 percent of its capability but still at the same time we have a performance of course faster than the mag M1 but not so uh not so much uh faster so not so impressive in comparison with the Mac M1 so we had around 40 milliseconds for mag M1 GPU here we're around 80 milliseconds again I can say here it can be of course even improved so if we Leverage The GPU more than this we can even like Drop in half like the uh the timing so we can double up the speed let's also test with another older GPU then video GTX 1660ti so let's run with Device one and let's see what we get device one GTX 1660 TI with six gigabyte of video RAM and we get around 23 40 milliseconds so this is closer to the Mac no it's still faster of course but we're talking about Goods both of them good and video graphic cards still faster but not so much faster we have around 23 to 30 milliseconds pretty much let's do another last testing with CPU I have a AMD ryzen 5 should be MD ryzen 5 2 600 it's I mean it's a bit old of a CPU for this and we get similar performance to the Mac around half seconds per image I hope that this video was useful for you to play around with object detection on your Mac if you want to know more about object detection and building projects you can check the courses that I have at pisource.com if you have a company or a startup and you want me to build your project you can contact me from piestar.com there is a contact form where you can ask for a quote for your project this is all for this video I'll see you in the next one

Info

Channel: Pysource

Views: 24,139

Rating: undefined out of 5

Keywords:

Id: kEcWUZ8unmc

Channel Id: undefined

Length: 34min 23sec (2063 seconds)

Published: Tue Mar 28 2023