Auto Label your Custom Dataset with Autodistill for YOLOv8

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys and welcome to a new video in this video here we're going to take a look at the new framework from roboflow called Auto distill it is basically using like the foundation models like segment anything and also grounding Dino to go in and label your data set automatically you don't have to do anything we'll just have a data set a custom data set that we want to train and update to take there on we basically just have the images we don't have to label them we throw them through it also there's still framework it will label images for us and then we can go in and train our own custom optic detectors like a yellow V8 model on our own custom data set without doing any labeling so let's not just jump into the GitHub repository and take a short look at it before we're going to jump into our Google collab a notebook where we're going to go over we have a custom data set how we can set it up how we can label it with these with these Foundation models label our dataset trainer custom update detective with yellow V8 and then we can use that to run in theorems on our its devices or our own computer I have videos about all of those different things that also have courses with my yellow V7 also my yellow V8 cores we both cover update detection and object tracking in those but here we'll be just basically just going to take a look at the GitHub repository all that they'll use big slower Foundation models to train smaller faster supervised models so they're basically just using like grounded Dino together with second anything to go in and do fuse shot and zero shot learning for these update detection models here we can just like scroll through it this is the example that we're going to use so this is an example output where we're going to detect these milk bottles in a production line so this is a very nice and useful specific case that can be used in a real life scenario for a production setup where we want to detect these milk bottles in a production setup on these conveyor builds moving around in the factory so this is a really cool one we basically just have a data set with all the images we have no annotation no labels for all of these for all these objects we will just throw it into also distilled framework and then we can train our own custom update detector with the yellow V8 model to go in and get these nice results so back again we can see we have some different kind of like features pluggable interface to connect models together automatically labeled data sets so this is really really cool we don't have to spend a lot of time actually labeling our data set we can just do like faster prototyping we can do a lot of different kind of stuff with this new cool framework train fast supervised model we can own your own model deploy distilled models to the cloud or to the edge so we're basically using these large Foundation models that understands like complex scenes going on on like complex Optics in scenes it can actually just learn that there's a milk button bottle in the image we don't have to do any labeling it just knows like how it how it looks like then it uses the segment anything model to go in and get that like bounding box and also the segmentation mask so this can both be used for segmentation problems but also for optic detection so perhaps the basic concept here we won't really go into that we have the task we have the base model so these large Foundation models is used as a base model we also have this ontology which is basically just a mapping from the classes or like what we actually want to detect to the classes for our data set we will create a data set and then we can have this target model at the end and also a distilled model for the final output here you can read a bit about like the theory and also the limitation again we have the base model with the grounding Dino Sam and data here and then we can go in and train like yellow V8 for instance segmentation update rotation we can also train all the other different kind of like Target models that we have so this is a really cool framework we don't have to spend any time on acts like labeling our images especially if you just want to test some different like data sets out if you have a video of a production line you can basically just extract all the images use it with this framework here train your own custom update take there and then you basically have like a deep learning AI model running here we can see like how we can install it we're going to go over that in the in the Golab notebook as well I can also see like how we can distill a model here in a custom python script we're going to do that in another video so in this video here we're just going to see how we can train it then we're going to create it with our own custom pattern scripts in other videos we can also visualize our predictions here down to the bottom we can see like updated section with grounded Sam grounded Dino and then also all the different kind of like Target models up here at the top so DTR dollar V8 so we have all these new state-of-the-art update detection models and also instant segmentation here with yolo V8 so this is really cool okay so we're not jump into the Google collab notebook we're basically just going to scroll through it I'll explain like what's going on and how we can use it on your own custom data set here in the start they basically just have something like what is the framework which basically just went over that in their GitHub repository so we're basically just going to scroll through that first of all make sure that you choose the runtime you use the TPU here in Google collab so here I'm going to use this TPU here we can also choose the TPU version that we want so I'm going to go with an 800 because I'm on the pro version of Google collab we're also going to choose High Ram down here at the bottom for our runtime shape then we're going to connect to our runtime and we can go down and actually run our code blocks so steps in this tutorial here before uh before you start we have our image data set preparation all we're going to Auto label the data set train the target model with the yellow 8 model evaluate that model we're going to see the results so the training graphs um after we're trained the model on our altered label data set and then we're going to run video inference we can also upload our data a model to Robo flow at a later point but in this video we're going to see how we can do video inference we're also going to export the model and so before we start we're going to have this Nvidia is a my so we can see that all the details about rtpu so again we have this Nvidia 800 GPU available I'm just going to zoom in here a bit so you guys can better see what's going on so here we can basically just see we have this Nvidia GPU we can also see the number of megabytes we have in our Ram so here we can see we have 40 40 gigabytes of RAM in rtpu which is really good so let's not just scroll down and install auto distill and see how we can set it up first of all we're just going to pip install it so we're going to pip install also distill also still grounded same and also Auto distill yellow V8 supervision for visualizing the results later on both from work around the same model and also our Target Model with yellow V8 so here first of all we're just going to install it after this is done we can basically just go down and get started um act like trying out doing Auto labeling of our data set so this might just take a couple of minutes and then it's actually just set up so down here we're just going to import our OS we're going to get our home directory on on Google collab here so let's just do that we're just going to run it so right now we're in our home directory which is our content directory if you go over here to the left this is basically our content directory so now we can go in and do image data set preparation we're just going to create an a directory called images where we're basically just going to download our videos or like our images to and then we can use those to feed into our grounded grounded Dino model the same model for actually labeling our data set so it's just going to create a directory we should be able to see it over here to the left so here we have our image folder so here we can download Raw videos if you have your own data set you can basically just skip these steps you will also already have like all your images inside your data set so this is basically just for like downloading a video also if you have a video you can use that as well and then we're going to convert our videos into images basically just extract our images from our video that we can then use in our data set to do also labeling so I'm just going to like create a new directory here so we're going to create these videos with your folder we're going to CD into it and then we're going to use we get to go in and download this video from Google Google Drive we're going to unzip the video so we also over here to the left we should be able to see our videos so we have this video and then we have the different kind of like videos down here at the bottom so we have this these milk bottles that we want to detect in our image as we also store on the robo flow GitHub repository now let's just scroll down we can then convert our videos into images we're just going to create our path or like set our video Direction directory path and also our image directory path um and then we can go down and basically just extract all the images from our video so this might just take a couple of seconds as well after that we can go down and display an image sample from our data set so let's just visualize what reacts like have in our data set so first of all we just need to wait until this has finished so it took just around like one minute to act like extract all the images from our video and before we're going to build a model with auto still we just we just verify the range like have everything stored in our folders let's just go up and verify that as well so here we have just extracted all the images from the video inside of our images folder so over here to right we should be able to see an example when it has loaded so this will be one of the images that we're going to feed into the ultra steel framework and do also labeling of these milk bottles and also the caps on top of the milk bottles so again this is just really cool that we can do this we don't have to like label our images we just have a single video thrown into it extract the images and then we can do auto labeling of our data set and even train um even train like optic detectors on that have a video just tell our model I wanted to take milk bottles in this video and then we get the detections we can use it to train our own custom models this is just really cool so now I've got just go down and display an image example the imagecon we have 132 images in our data set let's go down and visualize it okay we're just going to plot like an image grid of all the images or like some of the images that we have in the data set and here we can see some of the examples from our data set so we just went over like one of them but we see we have all these images with milk bottles and also the Caps extracted four from a video of a production line with these milk with milk bottles so these are some pretty good images let's now see how we can actually use Auto distill to also label our our data set so first of all we're going to define something called an ontology and Anthology is basically just like how our base model is prompted as it also says here and also just describe what you're actually see in your data set and what you want to train your Target Model on so this is basically just like the mapping from the prompts to the acts like classes that you want to detect so in this example here we're going to prompt it with milk bottle and then the class 4 or Target Model will basically just be bottle and then we're also going to prompt it with blue cap and then we're going to map that to a cap in this data set so we're going to have two classes in our data said that we want to predict or like two classes so we have our bottle and we also have our cap then we can initiate a base model down here and do auto labeling I'm just going to run this first of all because we have to Define this ontology this is one of the one of the things that we need to specify we also have the base model Target Model and all those different things that were listed in their GitHub repository then we're going to set our data data directory path and now we can go in and use other still underscore grounded same and then we can import a grounded same we set our base model and then our base model is initialized with our ontology for the different kind of classes so we can throw in the prompts we get the classes out and then we can label our data set after that we're going to create an instance of our base model and then we can just set our data set equal to our base model dot label so to have a method called label that we can directly call on our base model referring the input folder so this is this is the path to our image directory the extension of our images so we're going to use pngs and also the output folder where we want to store our data set and we're also going to just return our data set here and this then this data set here is basically like our whole label data set we only need to initialize this grounded grounded same model it will do anything for us we just need to initialize it with orientology as well our prompts for how it should label our data set or Auto label our data set so this is just really cool like we import it initialize the model and then we can just call this label method and they will do everything for us like this is groundbreaking this is definitely going to save a lot of time in the future it is just so cool so here we have our data set let's just run it and see it we'll have our data set we also have our output folder which is set equal to the variables that we set up at the top so trying to load grounding Dino directly downloading Dina model weights so this might actually take some time we have to first of all download the weights throw it through like throw all the images through it also label them and so on so everybody see that it downloads the model so it might take some time here so that was it now our whole data set is also labeled with this new also distilled framework so again it took like around a couple of minutes so maybe like three to five minutes now we have all our images labeled we have 132 images labeled let's just go over here we have our data set we have all the annotations so let's just open that we probably shouldn't do that with 132. we also have our images we have our train and also our validation split so we can go and see all the individual images that we have in our data set so now we have our whole data set also enabled we we just went through the whole column notebook here we ran a couple of code blocks and now we are actually here with our own also labeled data set so now we can go down and display a data set sample so we're going to set our annotation directory path our image directory path and also our data jml path so you guys are trying to set that and now we can use supervision to go in and create the detections for our data set from the YOLO format we basically just set up the different kind of like directories that we want to use and we can set up a link of our data set which should return 132 as well which is the number of images that we have in our data set so here we have 105 because we're only using the training images I think yeah that's correct and now we also have supervision to go down and create our image examples so Rachel just have a for Loop running through like all the famous names creating these annotation images with our mask and also the box annotator from supervision and demographics to plot our image grid again down here at the bottom with all with the symbols that we have from our data set with our Auto labeled so it might just take some time here before we're going to see the results here we go so now we have the images here again that we saw in the start but these are just some examples we have the video extract them into images we also label it with a couple of lines of code and now we have these predictions so we have our cap at the top here which is the which is the green marker here or like the green segmentation mask we also have the the pink one here or the red one which is the milk bottle so all these annotations here are automatically labeled with these Foundation models we basically just threw in the prompt milk bottle and blue cap and now we have these labeled images in our data set this is really cool these are some really great annotations as well like these are just perfect annotations from the segment anything model so again all these Foundation models are from Facebook or like meta AI so again some really cool models we're just going to scroll through them to see a couple of examples so again it just does a perfect detection it does a perfect annotation on on all of these examples that we see so we should have some good annotations that we can just directly feed in into our YOLO V8 model which is what we're going to do now so now we have our Target Model we're going to set up like the yellow V8 model as we've been doing in all the other videos I also have videos here on the channel where we go over how we can train our own our own custom data set with with manually labeled images now we can just use this script go through it also label it with auto distilled so now we're going to see the entire home directory we're going to import the yellow V8 class we set up our Target Model we can also specify what model we want to use so in this example we'll be just going to choose the Nano model for YOLO V8 and then we can call the train method the throw in our dataset jml path and also the number of epochs that we want to train our model for and then we should be able to see all first of all we'll get a summary of our model we see the number of parameters we also see all the high parameters for our training the Cuda device that we're using Nvidia 800 so these are just all the parameters all the hybrid parameters for a model the number of layers parameters and so on and then we're going to see the training results down here at the bottom for our epochs so here I'm basically just going to go over to the right we're going to take a look at the mean average position we can also see the Box loss the class laws but the main ones here is basically like the mean error's position so let's just let it run here let it train for a couple of minutes and then just take a look at the results so our model is now done training let's just take a look at the results here like the training results for each of the epochs I'm not sure why we get zero here in the middle position of of 0.50 and also 0.50 to 95 we might get some results later on but we can also still see like the class loss here and also the Box loss which is decreasing over the number of epochs which is uh good let's just scroll a bit further down and now we should also be able to see okay now we get the mean average position of 0.50 and also 0.50 to 2.95 so again we get some pretty nice mean average position here we also have the the last it is decreasing over time like the class laws is decreasing it acts like looks pretty good we can see that our minions position still increases we're around like 0.81.82.84 here so again this acts like pretty good mean average position just by training on 103rd images so just remember we have 130 images or like even only 100 images in our training set and then we're basically training our own custom YOLO V8 model from our Auto label data set here we end off with a mean error position of 0.86 mean air Precision 0.50 to 0.95 at almost 0.79 this is pretty good and we also have a good box loss and class loss we can go down on the dollar rate Arc talking model here so we can basically just go inside our run detection so if we go inside our directory we have our runs detect train and then we can see all like the outputs here so both have our confusion Matrix we have the results and so on but we can also visualize it directly in a block of code let's just do that and now we should be able to see our confusion Matrix so in the diagonal here we should have the most values because then we have our true positives or like our true predictions so here we have the prediction labeled and then over here on the x-axis we have our ground truth labels so we can actually see when we have the bottles we do a pretty good detect protection we also do like the cap here so the blue cap on top of the milk bottles um we actually have some pretty good detections for that we can also see the results here for our base like a training graph so let's just take a look at those we can see the pocket loss is decreasing over time as we also saw win-win over the results in the epoch the cloud loss decreases we have our Precision here so it kind of just jumps from like from zero to uh to 0.80 here instantly um it kind of like converges here is still increasing like slightly we could probably increase the number of epochs and get a slightly better results we can also see that the mean air position down at the bottom here it hasn't really like fully converged yet but we have a pretty good we have a pretty good model here pretty accurate model that can actually go in and do optic detection with our Target Model trained on also labeled data set we also basically just have a validation patch here which is a prediction so here we see some of the results this is the results from our YOLO model we can see that it actually does Miss some predictions here it is okay good at uh it is pretty good at detecting like the Caps where you can also see some nice detections of the bottles here also here here um sometimes we get some false positives we miss those detections here and also here so we kind of like um we can't really like to take this scenario here it seems like where we can see like the top part of the bottle or the bottles are very close to the camera we can also go down and just throw in a video so we're going to just throw in the video that we train on model M so we're basically just going to take that video through the model and then we can see the inference so now we just call this predict method we can just see that it just like it just throws all the images from the video through the yellow V8 model we can also see the number of bottles that it detects the number of caps and also the inference time over here to the right then if we scroll down to the bottom it is basically just firming all the images through the model the results are safe to runs detect predict and then we can basically just create a video from that so let's go over here to the left and just take a look at it so we have our detect we have to protect and now we have an mp4 file let's just go in and download that so we can then see the results on our apps like video stream so we'll take a look at the video here and see the inference ourselves on this video so again we see we get some really nice results we detect the bottles here in the foreground we we lose tracks some sometimes but again we're only doing updated section in this scenario uh the cops or like the Caps up here at the top we don't really detect those we just detect them as a single one sometimes we take them on top of the milk buttons bottles but sometimes we also miss them as we can see over here so right now we can see that we take the cap but with a low confidence score this was actually like a pretty good detection and also again we can still detect the bottles with a high confidence work again just remember we only trained this model on 100 images so these are some really nice results so thank you guys here and again remember to subscribe button the Bell notification on the video also like this video here if you like the content and you want more in the future we're definitely going to use this Auto distill framework way more in the future see you next week guys bye for now

Info

Channel: Nicolai Nielsen

Views: 13,972

Rating: undefined out of 5

Keywords: yolov8 custom object detection, object detection, computer vision, yolov8 tutorial, yolov8, yolo, state of the art object detection, ai, best yolo model, yolov8 real time object detection, instance segmentation, yolo v8 train, yolo v8 custom dataset, yolov8 custom training, grounding dino, segment anything, grounding dino sam, grounding dino object detection, roboflow, autodistill, roboflow autodistill, roboflow auto annotation, auto label dataset, auto annotate dataset

Id: 7tuXEvZ2YNw

Channel Id: undefined

Length: 21min 41sec (1301 seconds)

Published: Mon Jun 19 2023