Thank you for joining this CS Board video about
YOLO-NAS, a new and improved version in the YOLO models family for object detection, which can
be used as a foundational model in this field YOLO models have been around for a while
now. Presented in 2015 with the paper You Only Look Once, which is what the shortcut
YOLO stands for, and over the years we saw various improved versions up until YOLOv8 which
was presented earlier this year by ultralytics Now, a company called Deci has release YOLO-NAS
and showed it achieves great results in very low latency with the best accuracy-latency
tradeoff to date. In this video we will talk about how they were able to do that.
Let's start with Neural Architecture Search which is the meaning of
the shortcut NAS in YOLO-NAS name Most of the times, model architectures are
designed by human experts. Since there is a huge number of potential model architectures, it
is likely that even if we reach great results, we did not nail it exactly on the best choice of
model architecture out there and we could still find a different model architecture
that would yield better results. As a result, Neural Architecture Search was
invented and it includes three main components. A search space which defines the set of
valid possible architectures to choose from, a search algorithm which is in charge of
how to sample possible architectures from the search space as there is no way to try them
one by one, and an evaluation strategy which is used to compare between candidate architectures.
So in order to come up with model architecture for YOLO-NAS, Deci used their own neural architecture
search implementation called AutoNAC, which stands for Automated Neural Architecture Construction.
They have provided AutoNAC with the details needed to search YOLO possible architectures, which
created an initial search space of huge size, and AutoNAC, which is hardware aware,
found an optimal architecture for YOLO-NAS, which is optimized for Nvidia T4, in
a process that took 3800 hours of GPU Another attribute that helped
YOLO-NAS reach great results in super low latency is quantization aware
architecture. So what does it even mean? Well, objection detection in real-time is
critical for various applications, such as a safe use of autonomous cars, so we want to deploy
object detection models on cars, phones and more instead of running the model in the cloud.
However, edge devices resources are limited so it make it hard to deploy large models on them
due to their size and also long inference time. Quantization in machine learning usually refers to
the process of reducing the precision of the model weights so they will consume less memory
and run faster. This however many times comes with a decrease in the model accuracy
The quantization technique used in YOLO-NAS is called INT8 quantization which is
a way of converting the model weights from float32 to int8 so each weight
is one byte in memory instead of four. They were able to do that thanks to a new building
block called QARepVGG which they instructed the neural architecture search algorithm to include.
QARepVGG was recently introduced in research paper from meituan. We'll save the details about
this building block for a different video, but in short, it is an improved version
of RepVGG block that is commonly used in object detection models that significantly
improved the loss of accuracy after quantization They also used hybrid quantization technology to
apply quantization only for specific layers in the model to balance information loss and latency
Before moving on, if you like this content then please subscribe to the channel and hit the
like button as it will help this channel grow Let's move on to talk about YOLO-NAS
being a foundational model. So what a foundational model means?
"Say we have two different tasks at hand, one model to detect bone
fractures objects in an image and another task to detect fish objects in an aquarium.
Instead of training each model from scratch, we can start with two instances of YOLO-NAS
and fine-tune the first model on a bone scans dataset and fine-tune the second model on an
aquarium dataset. With this approach we enjoy the strength of YOLO-NAS pretraining that was done
on very large datasets with advanced techniques, while still adapting better to specific
use cases thanks to the final fine-tuning." Thank you for watching and taking the time
to learn about YOLO-NAS. Please let me know in the comments section if you have
specific questions about the model.