YOLO-NAS: Step by Step Guide To Custom Object Detection Training

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
The primary claim of YOLO-NAS is that it can  detect small objects better than previous models.   Although we can run several inference  experiments to analyze the results, training   it on a challenging dataset will give us a  better understanding. Hey there, Welcome to LearnOpenCV.   In this video, we will train YOLO-NAS on  a custom dataset and test it over an unseen set.   We use the UAV dataset or the Unmanned Aerial  Vehicle dataset. It is made up of 2898 thermal   images taken from different scenes that  is schools, parking lots, roads, playgrounds   etc, flight altitude ranging from 60 to 130 meters,  camera perspective between 30 degrees to 90   degrees and day and night light intensity across  five classes, person, bicycle, car other vehicle and   don't care. This is how the dataset looks like.  It is already split into 2008 training images,   287 validation images and 571 test  images and these are the annotations.   It is in YOLO format. Each image has a text file  with the same name and each file contains all the   object annotations of the image, which is the class  label x center, y center followed by the width and   height of the bounding box. The annotations are  normalized to lie within the range 0 and 1.   Let's head to the notebook and train YOLO-NAS models First we install the home repository of   YOLO-NAS using pip install super gradients. Since we  are running on Collab, we will need to restart the   runtime. After the installation is complete,  we'll import all the required modules. The trainer initiates the training process  and will set up the experiment directory.   Data loaders from super gradients are used  for hassle-free training of YOLO-NAS models.   Coco detection YOLO format train and Coco  detection YOLO format Val will help us define   the training and validation dataset whereas  models will help initialize the YOLO-NAS models.   YOLO-NAS uses the pp YOLO e loss during training  and the detection metrics are used to monitor the   mAP at 50% IOU and the primary metric. Now that  we have imported all the required libraries,   let's move to the next, step downloading the  dataset. We pass the URL and the file name   to the function. If the dataset zip file does not  exist, we will use requests to download the file.   Then we pass the zip file and use the  zip file library to extract all the files.   This is what the dataset directory structure  should look like. The next step involves preparing   the dataset in a manner that allows us to run  multiple training experiments. First let's define   all the dataset paths and classes and also the  dataset parameter dictionary. We will be using   these for the data loader. We'll initialize the  root directory, the image and the label directory   for train, val and test split and the dataset  class labels. Then we pass these values to the   dataset params dictionary. Next, we initialize the  hyper parameters for all the training experiments.   For the sake of this video, we'll set the number  of Epochs to 5, pad size to 16, and the number   of workers to 8. In reality, the models should  be trained until they reach a high enough mAP.   The next three functions help us visualize the  ground truth dataset images, but first each   label is assigned a unique color. We'll  convert the YOLO annotations format to   bounding box format x min y min, x max y max. The  bounding box format can easily be used by OpenCV.   This is the job of the function YOLO to bbox.  Now to plot the bounding boxes on the images,   we'll pass the image bounding box coordinates  and the class labels to plot box function and   use OpenCV functions to plot the rectangle  and print class label. For every image, it   will iterate over the bounding boxes, convert the  annotation format and denormalize them. Then using   OpenCV's rectangle function, we draw the bounding  boxes around the objects using points P1 and P2. To print labels for each box, we first estimate  the text size needed, update the point P1 and   P2 draw, a filled rectangle using the updated  points and then print the label text over the   filled rectangle. This function returns  the image with annotation drawn on it.   Finally we will plot the images, pass the image  path label path and the number of images to be   visualized to the function Plot. All the paths are  then extracted, sorted, zipped and shuffled. The top   end images will be plotted using matplotlib. It  reads the image and the label file and passes   them to the previous two functions. This is  the data visualization over the training set. Next we define the training and validation  dataset, we create trained data and val data   for training and validation split respectively,  and use the dataset params dictionary. To get   the dataset path and dataset class values,. The  data loader params is defined for both train   and val and it accepts the batch size and the  number of workers for the data loading process.   Run this code cell to check the default  transformations being applied to the dataset.   The augmentation supplied are Mosaic,  random assign, mix-up, HSV, horizontal   flip and padded rescale. Data augmentation is a  major part of training any Deep Learning model.   It allows us to train robust models for  longer time periods without overfitting it.   One of the above annotations, mix up can make  the dataset extremely difficult to perceive.   To remove the augmentations, simply pop it from  the list. This is how the dataset looks with the   augmentations applied. Next, we define the training  parameters like the warm-up settings, learning rate,   optimizer, loss function and more. We will calculate  two metrics but monitor only the primary metric   mAP at 0.5 to 0.95 IOU. The best model will  be saved according to this metric. Now that   everything is in place, we will train YOLO-NAS small, medium and large models. Trainer will set up   the experiment for us, models.get will download the  models with Coco pre-trained weights and finally   trainer.train will fine-tune the model. Pass the  model, training parameters, train data loader and   validation data loader to the code and hit run.  Let's see what the console output looks like.   First and foremost it prints that by using  YOLO-NAS weights. We accept the license. Check   this link for more details. Then it prints the  training parameters. Mode of training is single   GPU training. Number of GPUs that we have is one,  training set size is 2008, batch size per GPU is   16 and batch accumulate is one. Batch accumulate  specifies to how many batches of forward passes   will be done before a backward pass. So total batch  size is batch size cross number of GPUs that is 16   and effective patch size which also factors the  batch accumulate is also 16. Iterations for Epoch   or number of steps is 125 and gradient updates per  Epoch is also 125. Now the model will be trained   for the first Epoch. It loads the batches of data  one by one and calculates the losses and metrics.   After the first Epoch is complete, the model is  evaluated over the validation set and then it   prints the summary for the first Epoch.  We generally skip the training losses.   Metrics calculated on the validation set are more  important. We get the F1 score, mAP values, various   losses precisions and recall then it starts the  next Epoch. It again trains the model on the train   set and evaluates over the validation split and  then prints the summary for the second Epoch but   after the first Epoch, it will also start comparing  the metrics with the best and the last Epoch.   A Green color difference indicates Improvement  whereas Red indicates degradation. These are the   metrics. After 50 Epochs. We have separately trained  the YOLO-NAS large model for 100 epochs, let's run   inference over the test set using this model and  check the results. Check out our YOLO masterclass   Playlist for intuitive understanding of YOLO  models like YOLO R, YOLOv5, YOLOv8 and more.   We have created a directory to save the inference  outputs to and set the device to CUDA   and then load the large model with the best  training Epoch when we trained the large model   for 100 Epochs. To run inference, we'll be using  the function Predict. First we list all the images   in the test directory, then we'll pass each image  iteratively to the Predict function and save the   output using out.save, finally renaming  the prediction to the original image name. [Music]   We'll plot the ground truth labels over the inference  results to visualize the outputs. YOLO to bbox   converts the annotation, Plot box draws boxes over  the images and Plot randomly plots a few images. The video inference was run on a laptop with  GTX 1060 GPU and the model ran at the average   of 17Ffps. Given that we are using the YOLO-NAS large model here we are getting decent speed.   So that's how you fine-tune YOLO-NAS models over  a custom dataset. If you like this video, why   don't you try training YOLOv8 models over a  custom dataset or compare YOLO-NAS models in   this video. Do COMMENT on what you would like  to see next and don't forget to SUBSCRIBE.   Thanks for watching, until next time!
Info
Channel: LearnOpenCV
Views: 16,731
Rating: undefined out of 5
Keywords: Yolo-NAS, Yolo, yolomodels, objectdetection, deeplearning, learnopencv, opencv, yolov8, yolo nas, yolo nas github, yolo nas new model, yolo nas tutorial, yolo nas in python, yolo nas model, yolo nas web app streamlit webcam, yolo nas model training, yolo nas with custom model, yolo nas demo, computer vision, computer vision tutorial, what is yolo, yolonas github, yolo_nas paper, yolo-nas ultralytics, yolo-nas architecture, yolo-nas vs yolo v8, yolo-nas deci ai, yolo-nas segmentation
Id: vfQYRJ1x4Qg
Channel Id: undefined
Length: 11min 30sec (690 seconds)
Published: Mon Jun 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.