YOLOv5 Object Detection: How to Train a Custom Model with Code and Dataset

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

here we get a confidence score of point 81 we're detecting the mock again we're detecting the mark we're drawing like really nice boundary boxes around the mark and they act like really close to the ones i drew in um in roboflow hey guys i'm going with a new video in this neil networks and deep learning tutorial in this video here we're actually going to create a small project where we're using the yolo v5 uh neon network where we're going to create a custom optic detector so we're going to do everything from scratch here it's actually really easy to set up i'll provide the the collab notebook so you can actually just go to my github link in the description go get the code train your own custom update detector i'm going to show you from scratch how we can actually generate a data set how we can annotate and label the data set drawing boundary boxes and so on how we can load in the data set into google colab and then train our yolo v5 um new network to do object or like custom object detection but first of all remember you're on discord server i'll link to the description here you come join channel chat about computer vision deep learning ai and so on you can also become a member of the channel if you want to support the channel with a small amount of fee everything will go to create more and better quality content here on the channel also if you have some problems in your own projects or applications you can also become a member of the channel and i can help you out and guide you into the right direction so thank you guys so first of all here we're just going to start in the yellow v5 github repository here from ultralytics so here we can see the nearest model here of of the yellow e5 we can get a short description of like what is going on some documentation how to set it up and how what different kind of like libraries and frameworks can use it with so here we're going to see that the old v5 is a family of update detection architectures and models pre-trained on the cocoa data set and then we can actually use transfer learning to train it on our own objects and so on there's a lot of different kind of variations of the yolo v5 models we can use some like small models large models depending on your application and also the complexity of your objects that you want to detect in your images we see how to install it here we can just like pip install it but from the requirements here from um from github we can see inference here how we can actually implement and deploy the models that we've trained our yolo e5 model on both with um opencv numpy pill and so on so basically we can just load in our model by using pytorch and then we can just have our images then we can just call model and then pass in the image into that model and that's what this will actually be the result and then we can just print the results we'll get the boundary box off the objects that is tracked or like detected in the image so in another video i'm going to i'm going to create another video uh where we're going to create a project we're going to load the model into opensv so we can actually do live update detection where we're tracking around optics from our yolo model around in image frame so also to just to show you how we can actually implement and deploy this in our own projects and applications after we're trained it on our custom data set we see there's a lot of different kinds of tutorials here on the github as well both for inference training but also some sort of tutorials here how to train a custom data set as i'm going to show you throughout this video we can see tips for best training results weight and bias locking how we can actually lock those during the training of our of our neural network we also have a multi gpu training roboflow for data set labeling and active learning as i'm going to show you as well so we're going to use roboflow to actually upload our data set too so we can then label it and annotate um our our data so we can actually draw boundary boxes around all the objects that we want to detect in the image frame so then we're going to create a generate our data set with roboflow we're going to load it into google collab train our actual yolo model here on the gpus from google collab and then we can actually export those models after we have done like validation and also seeing the results of our updated texture we can also use trend learning with frozen layers some hybrid parameter evolution and so on we can see the different kind of environments we can see google collab kaggle and so on integrations so and then we can also see all the different kind of models down here so we'll have like different types of viola models if you want better performance or if you just want to have it to run faster so there's a trade-off between like accuracy and also the speed so you need to choose a model that fits best for you down here we see the metrics for all the different kind of pre-trained uh pre-trained models here in this video we're going to use yolo v5 and the smart model we can see the pixel size that it takes as input and also all these different kind of metrics here both speed and also the accuracy and the number of parameters and so on here so we can test out different kind of models depending on your application and also how complex your application or project or optics are also if you want to detect multiple objects in your in your image uh compared to only like one object in this video here we're going to take the mock so we will only detect one object in the image frame but you can just have multiple objects you just need to label them with roboflow as i'm going to show you so now we're jump straight into roboflow here you can just go in sign up and get again like a free trial then we can use this mock detector i already created a project for that then we can go into the overview here see some different kind of things and also to upload so we can upload images uh new images from our data set we can go in annotate them as well we can see here that i've already annotated or labeled all these images we can then split into a validation and training set and also a test set but if we just go into one image here for example we can see that i have drawn this boundary box here around around here the cup when you go in here you can actually just change it we can just go in here for example and then we can generate the tool then you just get this tool up here and you can actually just draw a boundary box here around it then you get up some couple of different kind of like options up here so you can just save this class so in this example i just called it mock and then when we hit enter it will just create this class here called mock if you have multiple objects in the image frame for example if i wanted a person to so we can hear we can create a new class called face and then when i hit enter we'll just create this face class here as well and then it will just do all the stuff for us it will generate the different kind of files that is needed like the format files needed to actually train the yolo model but this would here we're just going to take the mox i already did the annotation on all the images here so we can just go further we see the boundary boxes here that are drawn around this mock that we want to detect on so here we just have 70 images in our test set so this is actually like fairly good we'll get some really nice results only with 70 images you can also generate more but this is just to show you guys how we can actually get started with creating a custom update detector with yolo v5 so after we have we have actually uploaded our images and annotated them we can go into our data set and then we can just generate a new version of our data set i've already created one here but now we're just going to create another one just so i can show you how we can actually like do it first of all we just need to choose like the train and the test split so in this example you can just go up here and rebalance it so right now we're just choosing like 20 for validation and all the other images we want to use that for training because we don't really have that many images in our data set and we're not going to use anything for test because when we're going to test it we're just going to use our validation images but if you have like several 100 images um you can split it into another like so you have to actually have more validation data and also test later and so on but in this video here we're just going to go with 70 train images and a validation images and even though we have this low amount of not like number of of of images in our training set we will still get a really nice up detector as you're going to see so we're just going to hit save so this is our train test split then we go down here to pre-processing and we can actually um to choose some different kind of pre-browsing step so right now it just also automatically do ultra orientation and also resize it so it resizes the images to 416 by 416 so it acts like just this is just input size to the yolo model that we need to we can also add some other different kind of like pre-processing options if you want to do that in your image for example we could grayscale all the images before we're training our new network to make it faster and also to reduce the complexity a bit if you're for example right now we have this mock here which is yellow so we might benefit from having actual like rdb images compared to grayscale images when we're going to do this update taken at least when we have this low amount and a number of of images so after we've done the preprocessing we can go down and choose some different kind of augmentation steps you can just go in here choose for example we want to do horizontal flip or if you want to do vertical flip in in some way it doesn't really make sense in this example because we don't want the mock to be on the upside down so in this example we're just going to use flip as um as a data augmentation method and also rotation here we can choose that the angle here that we want to rotate our images on here as well in our actual like an augmentation and it's really important that you use augmentation if you have a low a like low number of images in this example we only have 70 but as you're going to see we're going to get a lot more images when we're actually using augmentation and we'll also get more variation in our images that we're going to train our neural network on just to show you some other different kind of augmentation options we have we can also rotate our images 90 degrees we can crop them if we want to like crop our images or like zoom in and out if we want to have have our models generalizing more on that if if it really doesn't matter where the uptake is in the image frame in the depth for example we can use crop we can also shear grayscale lid we can play around with some of the different like color spaces we can flip it like crop it and so on so we can do a lot of different things we can also apply blur and noise on the images if we want to have a more robust solution or like a more robust neural network when we're actually training on neural network so here we're just going to continue we're going to go with flip horizontally and also we're going to rotate it between minus 30 degrees and plus 30 degrees on our images so now we can just go down the last step is to generate our actual data set we need to choose the maximum version size here so again for the free version you can get up to like three x uh the amount number of images that you actually have when you're applying data augmentation if you have some other different kind of plans you can see you can get a lot more images here from using data augmentation but right now we're going from 78 images up here from the source images to 218 images after we have actually applied data augmentation on our images so we create a more robust and general model so now we're just going to hit generate and it will actually generate our whole data set we can just go up here call it mock data set 2. we're going to save the name it just processes it and generates the images here so again right now it took like five seconds to generate the whole data set here with the data augmentation and so on we can also see all the different kind of images that we have the 216 these images is actually like flipped and we can also see here that we have these images that is rotated so right now we have all our images in our data set we have 288 images in our train set i have eight images in our validation set so this is not really that matt that not that many images in our validation set but as we're going to see when we're going to test it we're still able to detect the mocks um in our um in our actual images even though we have this like only 280 images that we're training on but right now we can actually like both train here in um in the in model here but it actually costs money but we can just going to go collab i can show you how we can train a custom yolo model um with this data set here from roboflow so right now we have our data as a generator then we can just go in here and export it so now when we export here we can just go into show download code um here we're just going to choose the format and this format which one reacts like training yolo b5 we need this yolo v5 pytorch txt format also if you're using some data sets from the internet uh or like from from google and so on and you've just downloaded your computer make sure you know like what format it is on you can actually convert it here in roboflow as well if you just upload the data set you can for example uh convert from pascal voc format to your v5 port uh pytorch text format so you can actually train your custom yolo v5 model you can both download the dataset to your computer in a zip file or you can just show the download code that we can then directly get in google lab and download into our notebook so here we're just going to show the download code when i hit continue it will just start with without like start out with sipping the files and then they will download them then we just get this code snippet here we can just copy this we'll get an api here that is that is just like like a private api key here so it just goes in gets the data set from roboflow here and then we can actually store our data set and this variable in our code so this is all we have to do in roboflow we have downloaded the code and now we have this code snippet that we're just going to call it paste and then we're going to paste copy it and then we're going to paste it into google collab so now we're jumped into google app and now we're going to do uh the whole like update detection custom objection with the mock so we started from scratch we generated the images on our computer and then we just used rubber flow to do annotation label the images with all the bounding boxes we did some data augmentation generated data set and now we have copied um the code snippets so we can actually download our data set into google colab as we can see here we have no files in the directory right now but when we act like download both yolo v5 and also the data set we'll get all of these things over here in our in our folders so right now we're just going to set it up first of all we're just going to uh to clone the github um or like clone the repository yellow v5 as i just showed you on the github repository so we're going to clone that we're going to install the requirements so these are just codenamed from the github tutorial from ulb5 then we're going to import torch os and also ipython display so we can then display the images later on with the actual update detection of the mock if you're actually able to use cuda here we're going to use the cooler device or like a tpu tracks like train our custom yellow v5 model you need to make sure you go up to runtime change runtime type and then you need to choose the hardware accelerate here as the gpu then it will train the model on the gpu and it will take like it will just go way faster than using the cpu you wouldn't be able to train your custom yellow v5 model on a cpu i just use the gpu here in google collab so we're going to run the blog code we'll download the lv5 we're just cloning the repository and then when we go into the files over here to the left we can see we actually get this yellow v5 repository we see both the data the models some requirements that we're going to download how that training script validation script and so on and also to get up some some setup detection so we're gonna like do detections with just a python script we just call it with some different kind of like arguments to that python script so now we're just going to go in here go to the next blog of code so now we're just going to open up your lovely like roboflow here so we're just going to create our roboflow we're going to use the model format which is yolo v5 so this is just this is just the format that we stored our data set in and also the notebook here will just be ultra ultralytics if we're going to run this blog code after we've completed the first one and now up here we can see that the setup is complete we're using towards 1.1 1.10.0 plus plus uh cuba here 11.11.1 we also get the information about what gpu are we actually like training on so now we're going to set our environment here to the dataset directory so we'll just be content slash datasets so this will just be the folders that we have over here to the left so we're going to set our environment up and then we can actually just go down here and paste in our code snippet that we got from roboflow we've already installed roboflow and we have also already imported so we're just going to delete those lines of code this will be the private uh api key you can also just use it if you want to use the exact same um detector or like the mark detector as i did if you don't want to generate your data set beforehand and you just want to test this mock to take day here on your own images and so on so then we're just going to set up the project and then we're just going to take the data set here so we now need to choose the version and the version here that we're going to use is um one so this is the first version of our mock detector because i already had one which will be version zero so i'll have one but we created a new one that we're going to train on so now we're just going to hit enter or like we're just going to run the flow of code we can see that we're loading the robo workspace reloading the robo project then we're downloading the data set here in the zip file into content slash data set slash mock detector one uh to yolo v5 pytorch here we can also see the progress bar so now we have act like downloaded and we also extracted our data set so if we go over to the folder to the left we can now see that we get our data sets folded up here to in in the in in the top we can see we get mock detector one we can go down here we can get all the train images all the images and also all the corresponding labels with the boundary boxes inside of that we also have our validation images so these will just be the exact same ones but in validation we only have what eight images to show if i'm going to open this one we can just see the images that we actually have that we're going to do validation on but this is all we need we also get all the labels here if we just go down we can just take the validation labels so if you go in here and open this file we'll just get the bounding box that we've actually like annotated um in roboflow as i showed you so we're going to close the folders here again so now we have our data set now we can just go in and train the yolo model with this training python script that is provided by yolo as well on the github and that we cloned into the repository so we're just going to run python this python training script then we're going to set the image dimensions the bad size that we want to use the number of epoch directly we want to train our model on for and then we need to specify the location of our actual like data set so this will be dataset.location slash data.yaml so this will just be so this will actually just be the directory for the labels of our of our data set and also the images then we need to wait then we need to choose the weights so in this video we're just going to use yellow v5s so this was the small model that we're going to use there's a smaller model a smaller model a one like one smaller model but in this video we're just going to use the small one we just have the pt here and then we need to specify cache because we just want to cache the images in the gpu when we're training try to like speed up the process when we're training it and also benefit from it when we're running hundred epochs so right now we're just going to run the script or like run this block of code it will just train the whole yola model here so again when you see that it's just downloading downloading some different kind of stuff it says some it sets up the high parameters with the learning rate momentum weight decay uh walmart epochs and so on and also all the other different kind of things it will download the weights the biases and then they will actually train the yolo model so now we can see down here that is it has downloaded and it has now started to start training the neural network so first of all we need we can get like we just get a song over here of the model so this will just be all the convolutional layers in the actual like yolo yolo v5 smart model as we can see so we're going to get the exact dimensions of what is going on all the layers we have some convolutional layers uh some up sampling layers some concatenation layers and so on but if we go down to the bottom here we actually get all the epoxy that we're running so now our train here has actually started and we just get out this information about the gpu memory that we're using some of the different kind of like optic boxes the classes the number of labels the image size and also the precision over here to the right so these values here 0.5 and 0.95 is the confidence scores off of the boundary boxes that we're actually predicting on our images uh so these values here should actually increase over the number of epochs we can also see that it actually takes around like seven seconds to run through each of the epochs so it doesn't really take that long time when we're using the gpu to train a yolo custom update detector on the data set that we just created by ourself and we actually like would end up with some really nice results so this just takes a couple of minutes to train and they will never hear and then we can just do predictions on it later on so now your custom yellow five models here is like dawn training we can see that hundred epochs here completed in point uh 23 hours we see that the optimized strip here for both the last last weights and also the best weight so we can actually go in here and then it is validating the best ways that it found doing training we also going to see the training results down here in a second but we can see we got a model somewhere here of 213 layers we see all the different kind of parameters and so on the images labels um and so on so we see the results are saved to runs slash train slash experience so if we go in here we can actually go into our yellow v5 then runs train and then in an entire experience and then we just get all the weights here actually so we both get the best model and also the last model uh during our training so when we're going to do inference and do predictions on new images we're just going to use the best model that i found during training we also get some other different kind of information about the results training badges validation badges some labels confusion matrix so we can actually see the results but here we're just going to go over the folders here again and then we're going to use tensorblade ball to actually like lock the different kind of metrics and values that we had during training so here we're just going to load in tensorboard and then we're just going into locked here and then runs so we're just going to display and show all the same training results that we got during training of our yolo v5 custom optic detector so here we're just going to run this blob code it will then launch tensorboard and then we'll get all the different kind of metrics and graphs off here of our training results so as we can see here it is loading up we get all the different kind of metrics i'll just zoom out here a bit so we can see what is going on so this is all the metrics that we got from the experience down here it will just be the first training so we see we start at epoch zero and then we go all the way up to 100 epochs and then we can see the the results over time so we're actually here for the precision or like this metric map uh with a confidence score of 0.5 we actually have a really not convergence here at the end where we get a value of 0.99.5 so we have a really high accuracy on this metric here we want to see for the confidence score of of of 95 percent confidence here that it is actually like a bit more fluctuating here in the training process but we can also see that we get a really nice result we end up with values of point 80 in the precision we can also see the precision over here it also converges towards one at the end here after around fifth epoch is the same with this metric over here to the left map we can also see it converges after maybe 40 50 epochs and the same down here with the recall matrix um we all see these results so these are just the training results that we got from training our actual needle network and it only took like 10 10 minutes 15 minutes to actually train this whole new network here on our custom data set you can do all of it yourself and so on and use the gpu here in google collab everything will be down in the description i'll link to my github you can go get all the things and also the data set from roboflow so now here we can also see the box loss here it keeps decreasing our our loss function here is actually just constant and our optic loss also just decreases over the number of epoch so the last losses here we want to get them as small as possible as we can see here it actually like still decreases a bit after 100 epochs we could maybe like run it for 120 150 epochs but again we're still at a very very low loss so it wouldn't really uh it wouldn't really matter that much and it's also just the box law so it would just be the precision of the pumpkin box that we're throwing around the optic and not really like if we can detect the object or not so down here we're now going to do predictions again we have a prediction script provided by yolo so here we're just going to run this detection python script we set the weights often of the neural network or the model that we want to run the predictions on so again we just go into our train runs trains experience weights and then we're just going to choose the best weights from our whole training process of hunter epoch we're going to set the image dimensions and also the confidence score so if we actually have a confidence score that is over 0.1 so if there's a more than 10 chance that this is actually like a mock then we're going to to say that this is actually like a correct prediction and then we're going to display it we can play with around around with this confidence score here we can set it to 20 20 30 50 and so on but right now we're just going to maybe go with point two and then we're just going to choose the source here of our images that we want to do predictions on and again we don't have any test images so we're just going to run our predictions on our validation images then down here we're just going to have this for loop where we're just running through all the all the images in our acts like detections so first of all we just run this plot of code it will do detection on all the images that we have in our validation data set so again we had around like eight images as we can see here so now we have done the prediction on all those images and then we can actually go down here and display them so right now we just only have um eight images we see the results are stored to runs slash detect slash experience and also the speed here and so on so we get all the information here how long it took to process each of the individual images and also how many marks did we actually like detect in image frame so we can see here that in all the images we were able to detect one mock if we had maybe two marks in the image it will probably like return to here and it will draw perfectly bounding boxes like perfect boundary boxes around those marks as well we can test that we're actually going to deploy the neural network so make sure you hit the subscribe button and also the bell notifications so you will get a notification when i upload a new video where we act like deploying these yolo models here with opencv we're opening up the webcam moving the update the knock here around in the image frame trying to track it around in the image frame and also maybe have multiple mocks so we can so we can actually see okay we have only trained our neural network on on images with manmock is it actually able and capable of detecting multiple marks in an image and how reliable and robust is the solution even though we only have 200 images or like we started out with 70 images we augmented the images so we now have 200 then we trained a model here for 15 minutes so it took like 20 30 minutes and like in in total and then we actually have a custom update detector that we can then use to deploy in our own projects and applications but now we have run all the predictions here with we've saved the results so now we can actually just go down here and go into content you always five then runs the tech experience and then we're just going to take all the jpeg files inside of this folder and then we're just going to run through and display them with the corresponding bounding boxes so now we're just going to run this block of code and it will then display all the images here in our validation set i'll just scroll up to the top so here we can actually see it detects a cop a mock up here to the right we can really see the confidence score here because it's out of the image frame but here we're actually detecting this mock even though we can only see this mock here because my hand is kind of blocking uh the mock or the the mark here in the image but we still get a confidence score of 70 um in this image here so it's actually like a really nice optic detector that we created but only having 70 images so this is really efficient and really robust uh neural network and model that we can use here we get a confidence score 81 we're detecting the mark again we're detecting the mark we're drawing like really nice boundary boxes around the mach and they act like really close to the ones i drew in um in roboflow where act like annotated and labeled the images with the bounding boxes here as i showed you again we can see we're able to detect the mock in all the images uh here but we get a lower confidence score if we can't see the whole cup as well like the whole mock as we can see up here in this image so here we will have a lower confidence score than compared to like for example if we can see the whole mock in the image but again we're going to save the model here as we can see here so we can go in runs train experience way it's best we're just going to take the best model and then we're just going to download it into google collab we can then download it from google or like from google drive here um from our files and then we can just deploy them in our own python script with opencv or some other different kind of frameworks or libraries for loading in like numpy arrays and then just displaying them with a live webcam for example so we're going to do that another model uh in another video so we're just going to save the model here we can see that we're downloading the best model it will download it to your computer we can just download it and then we can import it later on in our own python script so that's it for this video here guys we've been through how we can actually generate a data set label the data set with the different boundary boxes how we can import that dataset into google collab get a custom oh like get a yellow v5 model train the yolo with five model on our data set and then do predictions on new data that it hasn't seen before so it took 20 30 minutes we created a custom optic detector you can use you can use arbitrary objects you can have multiple objects in your image frame like you can actually have just 100 objects that you want to track in the image frame you just need to label all the all the objects that you actually want to detect and then you need to train your model on on that so it's really nice it's really cool and we're going to create some really nice uh applications and projects with it um on this channel here in the future so again make sure to subscribe button and bell notification under the video and also like this video here if you like the content and you want more in the future it really help me and the youtube channel out in a massive way as we can see down here at the bottom it has now downloaded the best model we can just show it in the folder and then we can just import this pt file here later on in our own python script so if you're interested in other tutorials about deep learning computer vision and so on i'll link to more of my tutorials up here or else on the scene next video guys bye for now you

Info

Channel: Nicolai Nielsen

Views: 39,726

Rating: undefined out of 5

Keywords: yolov5, yolo neural network, yolov5 custom object detection, yolov5 object detection, yolov5 tutorial, object detection yolo, object detection pytorch, object detection python, opencv object detection, opencv yolov5, opencv python yolov5, object detector, train an object detection model, object detection yolov5, the coding lib, opencv, roboflow, roboflow dataset, yolov5 dataset, neural networks, deep learning, detect objects with yolov5

Id: rZyY2pNzypQ

Channel Id: undefined

Length: 30min 39sec (1839 seconds)

Published: Mon Nov 22 2021