Autodistill: Train YOLOv8 with ZERO Annotations

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I built that model literally in 30 minutes and you can do it too image labeling is slow and expensive process however we needed to train real-time object detection and instant segmentation models like yellow V8 and yellow Nas well we used to Auto distill is new library that allows you to distill Knowledge from large foundational models and transfer it into smaller highly optimized ones it consists of six core elements task based model ontology data set Target Model and distilled model let's start with task it decides whether or not out of the still will automatically build classification detection or segmentation model depending on the task you select base model the base model is large foundational model that knows a lot about a lot they are very often multi-modal it means that they can take different types of input like image and text and can perform different tasks for now how to distill supports only one base model grounded some but support for clip and many more is coming soon you can also very easily add your own base model and plug it in into out the distill framework base models are slow and terribly expensive too slow to be used in real-time scenarios however they are great for automatic data set annotation such data set can be later on used to train much smaller and faster Target Model models like yellow V8 or DTR are highly optimized to perform a specific task like object detection or instance segmentation on selected set of classes to Define that set of classes we need ontology you can think about it as the recipe that can translate instructions understandable by base model into set of classes understandable by Target mode at the end we just use the data set created with our base model to train our distilled model today we are going to test out of the still and use it to fully automatically build yellow V8 model capable of detecting bottles moving on a conveyor traditionally projects like that would require several hours of annotation however to today we are going to do it end to end in less than 30 minutes here's the plan first we need to create data set in principle to use Auto distill you only need images No Labels are required we'll use grounded sum base model to automatically create those annotations for us grounded sum is actually a combination of two models that we have covered on this channel before grounding Dyno and segment anything model grounding Dyno is zero shot object detection model which pretty much means that it can detect objects on images without any training some on the other hand is promptable model capable of converting bounding boxes into segmentation masks as the result base model will create fully annotated train and validation data sets that we are going to use to train our Target yellow V8 model the best thing about it is that the whole process is fully automated will just execute few commands and that's it okay enough of the talking let's Dive In as usual we created jupyter notebook that I'm going to use during this tutorial I highly encourage you to open it in a separate Tab and follow my instructions the link to Google collab is in the description below but you can also find it in roboflow notebooks repository first thing first we need to confirm that our Google collab environment as GPU accelerated to do that we'll scroll to the first cell of our notebook and run Nvidia SMI command if everything works as expected we should see SMI output with details about our environment like driver and Cuda versions if not make sure to follow instructions and before you start section awesome now it's time to set up our python environment and to do that we need to install few packages each base and Target Model has its own repository and pip package so when you design your out of distill pipeline you need to install packages associated with your base and Target Model in our case it's grounded Sam and yellow V8 on top of that you also need to install the base Auto digital package as it manages the whole distillation process you can find installation instructions in Auto distill repository along with the list of supported base and Target models in the end we just set up our home constant to make it a bit easier to manage paths to videos images labels and weights and we are good to go to use Auto distill all you need to have is a set of images that you want to automatically annotate and use for Target Model training we will store those images and images subdirectory located in our home directory you can see it right now in file explorer if you are following along with your data set make sure to copy your images directly into that catalog however in this tutorial I will start with set of video files and show you how to turn it into ready to use collection of images to start with I have prepared a set of few videos recorded in milk bottling plant each video lasts few thousand seconds and shows the bottles moving on production line all videos are unique and each one was taken at different location and from different angle now all we need to do is to convert those videos into images to make it a little bit easier we will use supervision one of the packages that we installed during our python environment setup before we will start processing we'll divide our videos into two groups test videos and train videos we'll put two test videos on the site and use them for model evaluation later on the rest is going to be converted into our data set supervision will open every video Loop over frames and save every 10th Frame into our images directory here is a small sample of our extracted images we can now proceed to the next stage where we Auto annotate that data that part is straightforward but a bit time consuming as I said in the intro base models are powerful but unfortunately a bit slow before we start processing we need to specify objects we want to Auto annotate I would like my model to detect milk bottles and those small blue cups at the top I Define my ontology and assign milk bottles to bottle class and blue cups to the cap class instructions on the left will be passed as prompt to ground at some if you work with your data set make sure prompts are descriptive enough depending on your use case you may need to experiment a little bit if you want to learn more make sure to watch our previous videos we do a lot of prompt engineering experiments there the links are in the description now you just need to sit back relax and wait for the base model to Auto annotate your data thanks to the magic of Cinema we can skip that part and take a look at the base model results right away we can see that our data set got divided into two parts both containing images and labels similarly as before we can now take a small sample of the images and plot them along with auto-generated annotations well those results look very promising we are almost there now we just need to use our Auto annotated data set to train our yellow V8 model the training process itself is exactly the same like with regular yellow V8 model I'm just setting up the starting checkpoint and the amount of epochs Click shift enter and we just need to wait for them all to train once again we will just use the magic of Cinema and speed up that process ah and if you want to learn more about the yellow V8 training we probably have the most popular YouTube video about it the link is in the top right corner and in the description below now it's time to test our distilled model some of you probably remember that at the beginning of the notebook we put two videos from our data set aside we did that to prevent leakage of test data into the training set now we can use those videos to evaluate our freshly trained model this way we can be sure that the model is capable of detecting milk bottles even on video footage that it has never seen before to perform actual inference we'll use ultralytics CLI we just need to provide two paths the first leading to our distilled model and the second leading to raw video the inference result honestly blew my mind sure we have some false positives and false negatives however remember we didn't even spend a second labeling those images of course the effectiveness of Auto distill is highly dependent on specific project there are some where Auto distiller performs flawlessly and can pretty much automate the entire process from start to finish on the other hand there are others where out of the still will need your help it will auto annotate the data but will require your input to clean up auto-generated labels however keep in mind that auto distill is a framework and with every powerful base model it will get better and better oh boy I can't wait for gpt4 image input to be released so that we can finally see how well it handles automatic image labeling so remember gpt4 is not just a language model it's also a vision model in the meantime make sure to test out of distill on your data set and let us know in the comments which base and Target Model should we support next you can find link to Autodesk GitHub repository in the description below and that's all for today if you enjoyed the video make sure to like And subscribe and stay tuned for more computer vision content coming to this channel soon my name is Peter and I see you next time bye
Info
Channel: Roboflow
Views: 23,760
Rating: undefined out of 5
Keywords: Grounding DINO, Object Detection, Zero-Shot Object Detection, Segment Anything, SAM, Instance Segmentation, Auto Annotation, Labeling, GroundedSAM, YOLO, YOLOv8, Autodistill, Python, Computer Vision, Prompt, Ontology, Tutorial
Id: gKTYMfwPo4M
Channel Id: undefined
Length: 10min 25sec (625 seconds)
Published: Thu Jun 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.