YOLO-NAS Custom Object Detection on Webcam - New State-of-the-art YOLO Model

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys welcome to new video we have a new state-of-the-art update detection model from Daisy this is called YOLO Nest so this is actually like a pretty nice model we're going to go through all of it I'm going to show you how we can set it up in Google collab how we can take our own custom data set train it in Google call lab and then also how we can deploy them get results and so on again this is a new state-of-the-art performance on update detection we're going to see some graphs we're going to go over the actual like performance it achieves higher performance higher accuracy and it also reduces the inference times compared to the for example that the yellow V7 and also the yellow V8 model so we have a news there they are Yola model so I'm going to show you how we can set up all of it we're going to do the exact same thing as we did in the old V8 video now we just have a superior to the yellow V8 model so this is going to be really interesting so first of all here we're just going to take a look at the graphs we're just going to do a comparison of the new YOLO NES models they also did some optimization to the model quantization and stuff like that so we can even more improve the inference time so the nas stands for neural architecture search so again this is not a new UL v9 model this is Yolo Nas it is really good as you're going to see in just a second so let's now take a look at the performance of these new yolones models compared to the old V8 dlv5 and yellow V7 models you can see here in dark blue we have the YOLO Nas models we both have an end 8 version and also a floating Point 16 so we also have this quantization aware architecture with this new state-of-the-art optic detection model we can see the inference beat down here on the x-axis we have the latency in milliseconds and then we have the accuracy on the waxes which is the mean average position and again here we can see that this is basically a benchmark on the cover data set for optic detection so this is an optic detection a benchmark data set we're going to take a look at some other benchmarks later on but here we see in dark blue we have the YOLO Nas models they are basically just Superior to all the other Euro models they're both faster for like inference but they also have higher accuracy we can see all these different Dots here it is basically just the different variations of the YOLO models so we have like small ones medium ones large ones and so on so here we can see the small ones if we just take that as a baseline we can see the small one here it is faster than the yellow V8 model but it also has higher accuracy so it's basically up forming all the other models here this is the new state-of-the-art optic detection model and we can even see here that we get even faster inference speed when we have this in eight version compared to the floating Point 8 version so we have this quantization and we're we don't really like lose that much of accuracy here um by actually like doing this quantization so this is also a very nice benefit and advantage over the other models especially when we are deploying the models in production and in the real life so they're basically just making small changes here and there now we have like these models with focus on quantization act like deploying these models in production and then again we still get higher accuracy on the different benchmarks so we're now jump straight into Google app I'm going to show you how we can actually set up the whole model here how we can import the different modules install it how we can use our own custom data set from roboflow then we're going to train the X like model here the YOLO Nas model we're going to see the results we're going to see how we can export the model and use it in our own python script so again this cool collab notebook here it will be available I'll further down in the description you can go in and use it just go through like all the blocks of code AS I'm going to do now in this video and then you can go ahead and train your own custom optic detector on your own custom data set again we're going to go inside roboflow we can just do that directly I have this cup up the detection data set that we have been using in other videos as well so also use this for my YOLO V8 video we can basically just go up export the data set we are going to choose the UOB 5 format here for the YOLO Nest we can then show downloadable code and then we're going to copy it paste it into our Google collab notebook when we're going to X like import our data set so we can train it on our own custom data set from Robo flow or any other annotation tool that you're using so now we're back in the Google app notebook let's just run these blocks of code to actually install the different modules that we need here again we need these super gradients from Daisy AI so right now I'm just going to use this version here because it's pre-launched when it is actually like launched out there it will just be like PIV install super gradients again everything will be in this notebook so you can just directly execute it train it on your own models without any errors so here we're basically just going to run these blocks of codes here so we can install it set it up and also import the different modules while it's installing here we can just go through like the next couple of code blocks so now we can just go in use the super gradients we can just have this great super gradients.training import the modules and then we can just directly go in and use the pre-trained models from your low Nest as we're doing with yellow V8 and all those different kind of models so we can basically choose between the large model Nano models and also the small models we can also choose the pre-trained weights that we're going to use so we can use for example like just the Coco if you want to use those and then later on I'm going to show you how we can go in and use a custom trained models instead of these pre-turned weights we're basically just going to load in the weights that we have been training inside of this collab notebook and then we can basically just use it and create an instance of our model with a single line of code then we can just directly go in and use this model to do inference and do predictions on new images so it is really easy to use again it was the exact same thing with the over 8 model it is also just a single line but again now we just have a better model that is performing better and also has lower latency so after our modules has been installed we actually need to go up and restart the runtime you'll have to do this or else you'll learn into an error this is the only thing that you need to do and then you should be able to run all the block all the blocks and all the code cells without getting any errors and then we can basically just run third see the results using our own custom data set so now we can see that we are reconnected we don't need to like PIV install the modules here again we just directly go down and run the block of code that we went over so now we're basically just going to create an instance of our model we're going to choose the large model here here we can see that it is basically just setting it up right now we're exporting it right now it's downloading it so it's 256 56 megabytes again it will basically just download it as it did with the yellow 8 models as well and then again you can choose like whatever version of the yolon s model that you want now it's done like 86 87 so just wait here after that we can go down and actually generate a summary of our model we can specify the input size um the names here for our columns and again we can do a lot of customization we can get a summary if you're interested in seeing like how the architecture looks for this specific model so let's just do that and then we should be able to see the output here in just a second so now we see that we're successfully installed torch info so we can get a summary of our act like model so here we can see the output of our model so these are all the individual like layers in our yellow Nest large model first of all we have like a backbone we can see that it consists of a different like vgt block so we have some convolutional layers we have different stages in our yellow NS model uh first of all we basically just have our backbone for extracting the features we have some upstages as well here so I think they're using some kind of like unit structure as well it kind of looks like that and then also have the head layers here at the end so we also have the down so we have up and down here so it kind of looks like a unit shape here or like a unit architecture using that and using some adheres for that and then we have to hit so we have our backbone and then we have our hits for the classification and it's like finding the bounding boxes at the end here pretty nice model here pretty nice that we can get a summary we can see the total number of parameters as well in our model we can see the input size and also the parameter size here of our model the total size of it so this acts like a rather small model compared to some of the other optic detection models out there so now let's just try to do inference on an example image here this is basically just the image that we're going to throw it through so here we just basically found URL we can then have our yellow model here we can call the partic method on our YOLO model we pass in the URL and then we'll get the results back so this is basically just a forward pass of our image from this URL you can pass in your url a numpy array and all those different kind of like data structures and then they will do a prediction of it return the results and then have a method for act like just directly showing the results on the image so we just called dot show we're going to do the exact same thing when we're going to export into our own custom python script but now we can see all the detections in this material we have some TV persons like a dining table a bowl here so we have some different kind of like bottles even though this bottle here is occluded so we can see that we act like detect all the persons that we have in our image so again we have the model running we can do inference on it so let's just uh jump further so you can read it through if you're interested I don't want to like spend too much of your valuable time so now we're going to see how we can fine tune the yolnes model on a custom data set from roboflow again we have already copied it and then we can just directly paste it into the collab notebook here we can also do inference via webcam I'm going to show you that in just a second where we're going to export the model that we've trained on our custom data set we're going to see how we can throw it into our own python script we can open up a video capture with opencv as in all the other videos that we did here on this channel but basically here we're just going to take a look at how we can fine tune it first of all we can set up like the trainer we have some different kind of like arguments we have some different parameters that we need to set up but again we're just going to like skip through those things like rather quickly I have all of it covered on my YouTube channel here I have a whole playlist going over like how we can train models all the different kind of like high parameters what does the higher parameters do when we're training their models I explained the high parameters and also the most like default or like common values for the higher parameters so definitely check that out if you're interested over here we're basically just going to go through the blocks of code first of all we set up our checkpoint directory and also our trainer so again we're just going to run this block of code the data set and the data loaders we can just scroll down you can use whatever tool you want to you can do some like pre-label data sets here directly from row flow or you can choose your own I have my own here and then we have this code snippet here or like this block of code this is basically just where we have to paste in the export code that we have from roboflow so I'm just going to do that in a second just make sure that you're actually changing your API key here or else you can actually use this data set directly you just need to specify your own API key run this log of code and then we can go down and act like train the models on our custom data set so I just paste it in my code snippet from roboflow from the export and now we should act like have extracted our data set and yes we can see we have it inside of our files we have our cop detection we have a test train and validation split we also have some other different kind of like information we have test we have the images and also the labels so now we're going to just directly go in and use this now we need to set up our data loaders and also our cover data like format so we actually go into pre-proze the data set and we're also going to specify the correct format to be able to train this YOLO NES model so now we have the data set parameters here we just need to specify like the data set so right now we have this data directory we're going to use our own custom one so we're actually just going to specify cup detection and then we have a one as well so this is the data set directory we have all the the directories for our training images training labels validation and so on and now we also need to specify the classes down here so here I actually think that I have two cups in my data sets or I'm just going to specify that one with an uppercase and one with a lowercase so just to make sure that we are um using the same labels we can just run this data set parameters here so now we basically have a dictionary with all the parameters that we can do use later on so now we can actually just go down here and set up our data loaders so we have our train data validation data and also our test data first of all we have this Coco detection YOLO format train so it's basically just converting between the different formats for our data sets so we don't need to take care of that we're basically just throwing in our data set parameters and also our data load parameters so we can specify the pad size here and also the number of workers that we want to use here we're just going to go with 16 you can also like choose 32 or something like that so let's just run this block of code here I'm just going to go out of the file so you guys can better see what's going on so now this blogger block of code is done running we're casting The annotation so now we can see they have loaded in our data then when you're going to act like set up some Transformations for data so we need to like pre-process our data before we can throw them inside our model so we're just going to do that we're just going to inspect the data set defined earlier so transforms are inside a dictionary so we basically just need to go in and index those we're just going to do that in these blocks of code and then then we can go down take our traindata DOT dataset and then we can plot a batch of our training data with the arguments applied so we can see what our data set acts like looked like before we throw them into our model so this is the train data we can see that a lot of data augmentation has been applied on this data set as well but we see all the cops here in our data set and again we have 16 images from our um from our patch we can also see the boundary boxes around our cups here so this is basically the images that we're going to flip through our models we both have the images and also the annotations with our boundary boxes so now we're just verified that we have the correct data loaded in now we can instantiate the model third through the model training and actually like model and then we're going to see the results so we're going to create a model exactly in the same way as we did in the top we're going to have the YOLO NES model we're going to specify the length of our classes so we're basically just going to specify the number of classes that we're using pre-trained weights we're going to use cogro so we're just going to use transfer learning and then we're going to fine tune a model on our custom data set we're not going to train it from scratch we're going to use the pre-trained weights from the covert data set you will probably always do that for your custom data set so now we just run this here and actually just create an instance of our model we can Define the metrics and the training parameters we won't go into details with that again I have all these things covered in my other videos here I have a whole playlist with that we have the train parameters we basically just set it up we have like um Walmart mode we have the learning weight Walmart epochs initial learning rate and we also have some optimizers so we're using the atom Optimizer we can see the learning rate we can see we also have a weight Decay for Optimizer all these different kind of parameters we also have the number of epochs so this is probably like the most important parameters it should probably be okay to run with these default parameters on like any of your custom data sets so the most important parameter here is probably the max Epoch so here we're just going to train it for 10 epochs for this example here if you want to get higher accuracy or your model is not done training you can basically just like increase the the number of epochs it also depends on how large your data set is and how complex your data set and and task is we set up the loss here and all these different things again I want to go into details we're using 10 EP parks and then we're just going to run this block of code make sure that all is set up now we can go down and actually just train the model we have set up the trainer we have to define the data set parameters we set up the data loaders we have created an instance of our model we have also set up the training parameters and now we're ready to train our model this is really easy to do we just have our trainer set up we just call the method.train we specify the models training parameters the train data set that we want to train our model on and also the validation data that we want to validate our model on during training so let's just hit this here we're going to run this block code and now our model is actually going to train again you guys can just see like how easy it is again this notebook here would be available down in the description you can just directly going to take it train it on your own custom data set and now you can use this new state-of-the-art optic detection model called YOLO Ness you can just follow through here with this training process first of all you will get some information about like the parameters so the pad size per GPU the number of tpus that you're using iterations per Epoch and so on and then after that you will be able to see the training accuracy and so on per Epoch so this is just awesome we can just take a notebook we can just export our data set from Road flow we don't have to write any code of all then we can run the whole notebook through and then we basically have a custom optic detector that we can export and use on our own applications and projects so it has never been easier to train these models here tracks like use them and then we can deploy them use them in different kind of like situations in production in the real life so now our custom model is done training it took around like eight minutes to run through 10 epochs over here to the right we can see how the mean error position 0.5 is increasing over time so we start at a really low accuracy here or like the mean average position and then it basically just increases over the number of epochs and then we can see that it basically just ends up at 0.995 here which is pretty good so even after just like five or six epochs we already had a pretty good update detection model again it only took like five minutes to get to this stage here so let's now go down and take a look at how we can get the best model every train throughout the number of epochs so they both have both have this model here with the best weights and also the last weights so it also use something called like an average checkpoint so it's basically just using the weights average across like all your training runs so if you're doing like multiple training runs it will average those models together and then you can use the average models instead of just using like the best one or the light last one in your training run so let's now go down and see how we can get the best one here if we go over here to the left we should get the checkpoint so this is basically my first YOLO Nest run we have the average model the checkpoint for the best model and also the checkpoint for the latest model right now we're basically just going to download the best one here I'll just go in download it and then we can use it later in our own custom python script so just close it down here so now we're going basically just go in and create an instance of the best model we choose the YOLO Nas the number of classes and then we just specify the path to our checkpoint so instead of using the pre-train weights instead of using like a pre-trained argument we just use this checkpoint path argument so let's now just create an instance of our best model and then we can basically go in call some other different analog methods like for example we have the trainer we can go in and test the best model that we have but we can also go in and directly run inference on this model as we did with the examples up above so just evaluate our best train model here to start with we just threw in the best model and also the test data set that we have so here we see some metrics about it we have the gloss for our class we have the last four intersection over Union uh we also have like the overall launch we have the position of 0.50 we also have the recall mean average position of 0.50 I would probably also want to take a look at the average position of 0.5 to 0.95 with an interval of 0.05 but again we can see that we have a pretty nice like F1 score we have the Precision recall really use metrics and also the mean errors position is also very high and our loss is also I'm pretty low so now we have all the metrics here we can basically just evaluate our model we can go down and do predictions on our best model this is not the data set that we're using right now so we can really just go in and take an image like this but we can actually just go in and take an image I'm going to show you that when we export the model we're going to export the model I'm going to show you how we can run interference in our own custom python script because that might actually be a bit more fun to do we could also go up to the top here or you can just go in topically take an image so it's not just to create another block of code here I'm just going to show you how we can actually do it I'm just going to copy paste the code directly from the custom python script that we're going to take a look at in just a second so basically just going to use CV2 or like ohmsv for that so we're just going to import that to start with so we have CV2 just import like numpy just to make sure that we have that as well so numpy SNP and again when we're calling this predict function we can also throw in numpy Array so we can export it use it as we're doing in all the other computerization tutorials that have on my channel and then we're basically just going to like show the image at the end post basically just run it through here first of all let's see if we can actually get the output so first of all we just have Embry here we're going to specify the directory that we want to in read from so we're just going to take a test image image here and then we're just going to choose the first one and we're just going to copy the path so let's just throw directly into our inbreed so we're just going to specify it here paste it in and we're going to close the files window again so now we have the image we read it in from opencv we store it in the invariable first of all we don't really want to show it here we convert it from PCR to RGB throw the image through our model by calling this dot predict let's just make sure that we have the correct model so we're going to use the best model and then we'll get the outputs so just run this here and see if we're actually able to run it we can see that we are let's just try to go down and print the outputs let's see what we get so we should basically just get a class here so let's just try it there we go and then if we go up here at the top we can see after I've called this prediction we can just directly call that show on our output so let's just do that so here we can just call outputs dot show and then we're just going to run and then we can see the outputs here directly in our Google collab notebook so this is really nice again we can see that we have act like been able to train a model on our custom data set we get some really nice boundary boxes around our cups here we also get a pretty high confidence score again this model only trained floor like um eight minutes this is really easy you can just run through the whole notebook and do the exact same thing on your own custom data set so this is really cool it's really easy to go down in the description you'll get the notebook go through it you'll learn a lot by acting like just training your own deep learning machine learning and authentication models and as promised we're going to see how we can actually set it up in our own custom python script so we just export the model from the Google app we're going to import the module so we're going to use pytorch opensv and then also super gradients to get the acts like models we can either set up a pre-trained ones with the cover data set as we did in the notebook and we can also go down and use the one from the checkpoint so the checkpoint with the best weights we can also go down and choose if we're going to use like Cuda or the CPU right now I'm on my MacBook so we have this MPS available or else if you don't have that it is just going to use the CPU if you have code on your computer it is going to use that and it will run significantly faster and it will run like it will run so fast with the Cuda enabled and we're definitely going to create some videos in the future where we're going to use my other computer with my 4090 GPU with Cuda and to see how fast these models can actually run and how we can create like different applications around it maybe do some comparisons with the yellow V8 and this new YOLO Nest models to compare it with real-time inference and all those tearing on things so definitely stay tuned for that remember to subscribe button and also the Bell notification so you get notification when I upload new videos with the YOLO Nest models I'm definitely going to cover that way more in the future videos this is an awesome model so right now we're basically just going to convert our model to a device we can also have this predict webcam that I'm going to show you in just a second then we can just directly run live inference on our webcam with a single method right now we're just going to run the exact same thing as we did in the collab we're going to read an image a test image we're going to convert it from BGR to RGB and then we can also throw through this predict method and then we just call this outputs.show to see the results in our own custom python script or we can just take the results and do whatever we want to do with that we basically just have an output class here where we can extract the information from so just around here we're going to open up a terminal and then it will basically just read in the image set up the model through the model do a prediction and then we're going to get the output called dot show and we should see the results in just a second so here we can see results we get the exact same results as we did in the Google collab notebook so again this is just to visualize and see how we can act like export and use it in our own custom file script so this is really nice and then you can use it in your own applications and projects so let's now close this one down here and basically just comment out these blocks of code and now we can try to do it on our live webcam so here I'm just going to copy paste those out comment is one in here we just have model dot predict underscore webcam then we can run it it will open up your webcam and then we should be able to see how it works so you can see that the live webcam is now running it is really low FPS because I'm running it on my CPU here on my MacBook but again we're going to create some other videos where we're going to run this on my 4090 GPU here we're detecting me as a cop maybe because we don't really have um we have a pretty like static environment we don't really have like too much variation in it and we're basically just capturing images of my of my table let's just try to like take the cup over here to to the right and let's see if it's act like able to detect this one here maybe it's not because we still have a very specific data set for our specific sample and we only have like 80 images in our data set so thank you guys and watch this video here we're definitely going to create more videos in the future with this new YOLO Nest models it has state-of-the-art performance for optic detection we're going to use it for different applications I'm excited to run it on my on my Windows computer where I have my other graphics cards that we can run it together with Cuda maybe create some applications around it see how it performs on real live webcam feed compared to the Elevate model maybe also some other YOLO models just to see how it forms over those when we're running on the GPU and in our own custom python script so thank you guys for watching it again and I hope to see you in the next one bye for now

Info

Channel: Nicolai Nielsen

Views: 14,979

Rating: undefined out of 5

Keywords: object detection, deep learning, yolo, object detection deep learning, object detection python, object detection tutorial, yolov8 vs yolonas, yolonas, yolo nas, state of the art object detection, yolov5 vs yolov8 vs yolo-nas, yolo-nas, how to train custom object detection model, real time custom object detection, yolo-nas custom object detection, custom object detection yolo-nas, yolo-nas google colab, yolo-nas vs yolov8, best object detection model, ai, computer vision, deci ai

Id: PBh9MFH2lB4

Channel Id: undefined

Length: 25min 30sec (1530 seconds)

Published: Thu May 04 2023