The primary claim of YOLO-NAS is that it can
detect small objects better than previous models. Although we can run several inference
experiments to analyze the results, training it on a challenging dataset will give us a
better understanding. Hey there, Welcome to LearnOpenCV. In this video, we will train YOLO-NAS on
a custom dataset and test it over an unseen set. We use the UAV dataset or the Unmanned Aerial
Vehicle dataset. It is made up of 2898 thermal images taken from different scenes that
is schools, parking lots, roads, playgrounds etc, flight altitude ranging from 60 to 130 meters,
camera perspective between 30 degrees to 90 degrees and day and night light intensity across
five classes, person, bicycle, car other vehicle and don't care. This is how the dataset looks like.
It is already split into 2008 training images, 287 validation images and 571 test
images and these are the annotations. It is in YOLO format. Each image has a text file
with the same name and each file contains all the object annotations of the image, which is the class
label x center, y center followed by the width and height of the bounding box. The annotations are
normalized to lie within the range 0 and 1. Let's head to the notebook and train YOLO-NAS
models First we install the home repository of YOLO-NAS using pip install super gradients. Since we
are running on Collab, we will need to restart the runtime. After the installation is complete,
we'll import all the required modules. The trainer initiates the training process
and will set up the experiment directory. Data loaders from super gradients are used
for hassle-free training of YOLO-NAS models. Coco detection YOLO format train and Coco
detection YOLO format Val will help us define the training and validation dataset whereas
models will help initialize the YOLO-NAS models. YOLO-NAS uses the pp YOLO e loss during training
and the detection metrics are used to monitor the mAP at 50% IOU and the primary metric. Now that
we have imported all the required libraries, let's move to the next, step downloading the
dataset. We pass the URL and the file name to the function. If the dataset zip file does not
exist, we will use requests to download the file. Then we pass the zip file and use the
zip file library to extract all the files. This is what the dataset directory structure
should look like. The next step involves preparing the dataset in a manner that allows us to run
multiple training experiments. First let's define all the dataset paths and classes and also the
dataset parameter dictionary. We will be using these for the data loader. We'll initialize the
root directory, the image and the label directory for train, val and test split and the dataset
class labels. Then we pass these values to the dataset params dictionary. Next, we initialize the
hyper parameters for all the training experiments. For the sake of this video, we'll set the number
of Epochs to 5, pad size to 16, and the number of workers to 8. In reality, the models should
be trained until they reach a high enough mAP. The next three functions help us visualize the
ground truth dataset images, but first each label is assigned a unique color. We'll
convert the YOLO annotations format to bounding box format x min y min, x max y max. The
bounding box format can easily be used by OpenCV. This is the job of the function YOLO to bbox.
Now to plot the bounding boxes on the images, we'll pass the image bounding box coordinates
and the class labels to plot box function and use OpenCV functions to plot the rectangle
and print class label. For every image, it will iterate over the bounding boxes, convert the
annotation format and denormalize them. Then using OpenCV's rectangle function, we draw the bounding
boxes around the objects using points P1 and P2. To print labels for each box, we first estimate
the text size needed, update the point P1 and P2 draw, a filled rectangle using the updated
points and then print the label text over the filled rectangle. This function returns
the image with annotation drawn on it. Finally we will plot the images, pass the image
path label path and the number of images to be visualized to the function Plot. All the paths are
then extracted, sorted, zipped and shuffled. The top end images will be plotted using matplotlib. It
reads the image and the label file and passes them to the previous two functions. This is
the data visualization over the training set. Next we define the training and validation
dataset, we create trained data and val data for training and validation split respectively,
and use the dataset params dictionary. To get the dataset path and dataset class values,. The
data loader params is defined for both train and val and it accepts the batch size and the
number of workers for the data loading process. Run this code cell to check the default
transformations being applied to the dataset. The augmentation supplied are Mosaic,
random assign, mix-up, HSV, horizontal flip and padded rescale. Data augmentation is a
major part of training any Deep Learning model. It allows us to train robust models for
longer time periods without overfitting it. One of the above annotations, mix up can make
the dataset extremely difficult to perceive. To remove the augmentations, simply pop it from
the list. This is how the dataset looks with the augmentations applied. Next, we define the training
parameters like the warm-up settings, learning rate, optimizer, loss function and more. We will calculate
two metrics but monitor only the primary metric mAP at 0.5 to 0.95 IOU. The best model will
be saved according to this metric. Now that everything is in place, we will train YOLO-NAS
small, medium and large models. Trainer will set up the experiment for us, models.get will download the
models with Coco pre-trained weights and finally trainer.train will fine-tune the model. Pass the
model, training parameters, train data loader and validation data loader to the code and hit run.
Let's see what the console output looks like. First and foremost it prints that by using
YOLO-NAS weights. We accept the license. Check this link for more details. Then it prints the
training parameters. Mode of training is single GPU training. Number of GPUs that we have is one,
training set size is 2008, batch size per GPU is 16 and batch accumulate is one. Batch accumulate
specifies to how many batches of forward passes will be done before a backward pass. So total batch
size is batch size cross number of GPUs that is 16 and effective patch size which also factors the
batch accumulate is also 16. Iterations for Epoch or number of steps is 125 and gradient updates per
Epoch is also 125. Now the model will be trained for the first Epoch. It loads the batches of data
one by one and calculates the losses and metrics. After the first Epoch is complete, the model is
evaluated over the validation set and then it prints the summary for the first Epoch.
We generally skip the training losses. Metrics calculated on the validation set are more
important. We get the F1 score, mAP values, various losses precisions and recall then it starts the
next Epoch. It again trains the model on the train set and evaluates over the validation split and
then prints the summary for the second Epoch but after the first Epoch, it will also start comparing
the metrics with the best and the last Epoch. A Green color difference indicates Improvement
whereas Red indicates degradation. These are the metrics. After 50 Epochs. We have separately trained
the YOLO-NAS large model for 100 epochs, let's run inference over the test set using this model and
check the results. Check out our YOLO masterclass Playlist for intuitive understanding of YOLO
models like YOLO R, YOLOv5, YOLOv8 and more. We have created a directory to save the inference
outputs to and set the device to CUDA and then load the large model with the best
training Epoch when we trained the large model for 100 Epochs. To run inference, we'll be using
the function Predict. First we list all the images in the test directory, then we'll pass each image
iteratively to the Predict function and save the output using out.save, finally renaming
the prediction to the original image name. [Music] We'll plot the ground truth labels over the inference
results to visualize the outputs. YOLO to bbox converts the annotation, Plot box draws boxes over
the images and Plot randomly plots a few images. The video inference was run on a laptop with
GTX 1060 GPU and the model ran at the average of 17Ffps. Given that we are using the YOLO-NAS
large model here we are getting decent speed. So that's how you fine-tune YOLO-NAS models over
a custom dataset. If you like this video, why don't you try training YOLOv8 models over a
custom dataset or compare YOLO-NAS models in this video. Do COMMENT on what you would like
to see next and don't forget to SUBSCRIBE. Thanks for watching, until next time!