YOLO-NAS: Step by Step Guide To Custom Object Detection Training

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

The primary claim of YOLO-NAS is that it can detect small objects better than previous models. Although we can run several inference experiments to analyze the results, training it on a challenging dataset will give us a better understanding. Hey there, Welcome to LearnOpenCV. In this video, we will train YOLO-NAS on a custom dataset and test it over an unseen set. We use the UAV dataset or the Unmanned Aerial Vehicle dataset. It is made up of 2898 thermal images taken from different scenes that is schools, parking lots, roads, playgrounds etc, flight altitude ranging from 60 to 130 meters, camera perspective between 30 degrees to 90 degrees and day and night light intensity across five classes, person, bicycle, car other vehicle and don't care. This is how the dataset looks like. It is already split into 2008 training images, 287 validation images and 571 test images and these are the annotations. It is in YOLO format. Each image has a text file with the same name and each file contains all the object annotations of the image, which is the class label x center, y center followed by the width and height of the bounding box. The annotations are normalized to lie within the range 0 and 1. Let's head to the notebook and train YOLO-NAS models First we install the home repository of YOLO-NAS using pip install super gradients. Since we are running on Collab, we will need to restart the runtime. After the installation is complete, we'll import all the required modules. The trainer initiates the training process and will set up the experiment directory. Data loaders from super gradients are used for hassle-free training of YOLO-NAS models. Coco detection YOLO format train and Coco detection YOLO format Val will help us define the training and validation dataset whereas models will help initialize the YOLO-NAS models. YOLO-NAS uses the pp YOLO e loss during training and the detection metrics are used to monitor the mAP at 50% IOU and the primary metric. Now that we have imported all the required libraries, let's move to the next, step downloading the dataset. We pass the URL and the file name to the function. If the dataset zip file does not exist, we will use requests to download the file. Then we pass the zip file and use the zip file library to extract all the files. This is what the dataset directory structure should look like. The next step involves preparing the dataset in a manner that allows us to run multiple training experiments. First let's define all the dataset paths and classes and also the dataset parameter dictionary. We will be using these for the data loader. We'll initialize the root directory, the image and the label directory for train, val and test split and the dataset class labels. Then we pass these values to the dataset params dictionary. Next, we initialize the hyper parameters for all the training experiments. For the sake of this video, we'll set the number of Epochs to 5, pad size to 16, and the number of workers to 8. In reality, the models should be trained until they reach a high enough mAP. The next three functions help us visualize the ground truth dataset images, but first each label is assigned a unique color. We'll convert the YOLO annotations format to bounding box format x min y min, x max y max. The bounding box format can easily be used by OpenCV. This is the job of the function YOLO to bbox. Now to plot the bounding boxes on the images, we'll pass the image bounding box coordinates and the class labels to plot box function and use OpenCV functions to plot the rectangle and print class label. For every image, it will iterate over the bounding boxes, convert the annotation format and denormalize them. Then using OpenCV's rectangle function, we draw the bounding boxes around the objects using points P1 and P2. To print labels for each box, we first estimate the text size needed, update the point P1 and P2 draw, a filled rectangle using the updated points and then print the label text over the filled rectangle. This function returns the image with annotation drawn on it. Finally we will plot the images, pass the image path label path and the number of images to be visualized to the function Plot. All the paths are then extracted, sorted, zipped and shuffled. The top end images will be plotted using matplotlib. It reads the image and the label file and passes them to the previous two functions. This is the data visualization over the training set. Next we define the training and validation dataset, we create trained data and val data for training and validation split respectively, and use the dataset params dictionary. To get the dataset path and dataset class values,. The data loader params is defined for both train and val and it accepts the batch size and the number of workers for the data loading process. Run this code cell to check the default transformations being applied to the dataset. The augmentation supplied are Mosaic, random assign, mix-up, HSV, horizontal flip and padded rescale. Data augmentation is a major part of training any Deep Learning model. It allows us to train robust models for longer time periods without overfitting it. One of the above annotations, mix up can make the dataset extremely difficult to perceive. To remove the augmentations, simply pop it from the list. This is how the dataset looks with the augmentations applied. Next, we define the training parameters like the warm-up settings, learning rate, optimizer, loss function and more. We will calculate two metrics but monitor only the primary metric mAP at 0.5 to 0.95 IOU. The best model will be saved according to this metric. Now that everything is in place, we will train YOLO-NAS small, medium and large models. Trainer will set up the experiment for us, models.get will download the models with Coco pre-trained weights and finally trainer.train will fine-tune the model. Pass the model, training parameters, train data loader and validation data loader to the code and hit run. Let's see what the console output looks like. First and foremost it prints that by using YOLO-NAS weights. We accept the license. Check this link for more details. Then it prints the training parameters. Mode of training is single GPU training. Number of GPUs that we have is one, training set size is 2008, batch size per GPU is 16 and batch accumulate is one. Batch accumulate specifies to how many batches of forward passes will be done before a backward pass. So total batch size is batch size cross number of GPUs that is 16 and effective patch size which also factors the batch accumulate is also 16. Iterations for Epoch or number of steps is 125 and gradient updates per Epoch is also 125. Now the model will be trained for the first Epoch. It loads the batches of data one by one and calculates the losses and metrics. After the first Epoch is complete, the model is evaluated over the validation set and then it prints the summary for the first Epoch. We generally skip the training losses. Metrics calculated on the validation set are more important. We get the F1 score, mAP values, various losses precisions and recall then it starts the next Epoch. It again trains the model on the train set and evaluates over the validation split and then prints the summary for the second Epoch but after the first Epoch, it will also start comparing the metrics with the best and the last Epoch. A Green color difference indicates Improvement whereas Red indicates degradation. These are the metrics. After 50 Epochs. We have separately trained the YOLO-NAS large model for 100 epochs, let's run inference over the test set using this model and check the results. Check out our YOLO masterclass Playlist for intuitive understanding of YOLO models like YOLO R, YOLOv5, YOLOv8 and more. We have created a directory to save the inference outputs to and set the device to CUDA and then load the large model with the best training Epoch when we trained the large model for 100 Epochs. To run inference, we'll be using the function Predict. First we list all the images in the test directory, then we'll pass each image iteratively to the Predict function and save the output using out.save, finally renaming the prediction to the original image name. [Music] We'll plot the ground truth labels over the inference results to visualize the outputs. YOLO to bbox converts the annotation, Plot box draws boxes over the images and Plot randomly plots a few images. The video inference was run on a laptop with GTX 1060 GPU and the model ran at the average of 17Ffps. Given that we are using the YOLO-NAS large model here we are getting decent speed. So that's how you fine-tune YOLO-NAS models over a custom dataset. If you like this video, why don't you try training YOLOv8 models over a custom dataset or compare YOLO-NAS models in this video. Do COMMENT on what you would like to see next and don't forget to SUBSCRIBE. Thanks for watching, until next time!

Info

Channel: LearnOpenCV

Views: 16,731

Rating: undefined out of 5

Keywords: Yolo-NAS, Yolo, yolomodels, objectdetection, deeplearning, learnopencv, opencv, yolov8, yolo nas, yolo nas github, yolo nas new model, yolo nas tutorial, yolo nas in python, yolo nas model, yolo nas web app streamlit webcam, yolo nas model training, yolo nas with custom model, yolo nas demo, computer vision, computer vision tutorial, what is yolo, yolonas github, yolo_nas paper, yolo-nas ultralytics, yolo-nas architecture, yolo-nas vs yolo v8, yolo-nas deci ai, yolo-nas segmentation

Id: vfQYRJ1x4Qg

Channel Id: undefined

Length: 11min 30sec (690 seconds)

Published: Mon Jun 05 2023