Hey everyone, Ivan here. Welcome to
the first part of the YOLOv5 Series! In this series, we'll learn how to train a YOLOv5
model on a custom dataset, from start to finish. We'll look at how to install YOLOv5 and its
dependencies, on both Google Colab and on Windows. Then we'll learn about collecting and labelling
our custom data. We'll also learn how to properly configure and monitor training with the
Weights & Biases machine learning tools. This video, Part 1, focuses on two things
in particular: quickly setting up YOLOv5 on Google Colab, and diving deep into
what it takes to install it on Windows successfully with GPU support. In both cases,
the end goal is the ability to run inference, in other words, having the pretrained model
detect objects in our images or videos. If you have any questions or comments
at any point throughout this video, feel free to leave them in the comment
section down below. Let's get started! I've actually already made an overview video about
training YOLOv5 with Weights & Biases integration in an example Colab notebook. If you want to
get a feel for how the training will look like, I really recommend that you check that video
out. You could call it part zero of this series. In the Colab notebook from that
video, the first cell sets things up. The second cell runs inference, aka
detecting objects in an example image. The command `detect.py` automatically downloads
a YOLOv5 model to perform inference on a selected image. We can select a new example
image, and see how the model will do on it. We'll come back to Colab when we talk about training in part 3 of the series, but
that's really pretty much it for now! Colab is great, but when it comes to
inference there's a very serious drawback: we can't run detection in real
time, say, from our webcam feed. That's where we need to get our hands dirty
and set things up on a system like Windows. The first thing that we need to do is to download
and unzip the YOLOv5 repository from GitHub. Now, let's go over the requirements outlined
in the repo: we need to be running Python 3.8 or later, which means that even though I have
good old Python 3.6 installed, I'll need to go ahead and get myself a newer version. We'll
also need to have these modules installed. Since I have another version
of Python already installed, I will leave the "Add Python 3.9 to PATH" box
unchecked and then do it manually afterwards. We can see that the 3.9 version has
now appeared in the Python folder. However, if I open the console, it still uses
Python 3.6 by default. Since we'll be running YOLOv5 through the console, it is crucial
that we call the right version of Python. If your console already calls the right Python
version, feel free to skip the next step. Environmental PATH variables allow us to
call certain programs from the console anywhere in our system. Like, say, running
Python 3.9 inside the YOLOv5 repository folder. In "This PC" I will click on "Properties" and
then go into the "Advanced System Settings" to see the environmental variables. I'll click on the
PATH variable and change Python 3.6 to Python 3.9. Now we can check and make sure that
we are indeed using Python 3.9. YOLOv5 uses PyTorch, currently one
of the most popular machine learning frameworks, to define models, run
inference, and perform training. ML frameworks typically provide
an easy and efficient way to run essential inference and training calculations on a GPU, which tends to be orders of magnitude
faster than running similar calculations on a CPU. Well, that's provided that you
go through the pain of installing everything correctly. But no
worries, that what I'm here for! Let's install PyTorch as a Python module. We'll
go to the official "Get started" page and select the Windows build that we want. If you're
on an Intel-based Mac and have an NVIDIA GPU you can select Mac as your OS, and
proceed with enabling GPU support. Regardless of the OS, If don't have an NVIDIA GPU you can also select the CPU
build and proceed with the video. Alright, if you're using the GPU option, we also need to install the correct version
of NVIDIA's parallel computing platform, CUDA, which is used to enable GPU acceleration
for training and inference. I'll go with CUDA 11. We can go to the CUDA 11 download page,
select our operating system - which is Windows 10 in my case - and
download the installer file. When it's downloaded, we can launch
the file and begin installation. Upon completion, we'll be
asked to restart the computer. Now that we have installed CUDA, we can
install GPU-compatible PyTorch as well. First, we will copy this `pip install` command.
Next, we'll run the console inside the Python scripts folder, paste the command from the
PyTorch website there, and press enter. After successfully installing PyTorch
and CUDA, there are a few other Python modules that YOLOv5 needs. There is a list of
these modules in the `requirements.txt` file, and we can run `pip install -r requirements.txt`
inside the YOLOv5 folder to install them. And, just as everything was going well,
we get a very scary looking error. There's a problem installing pycocotools. It's really easy to miss behind all
of the redness of this exception, but there's actually a little sentence asking us
to install Microsoft C++ Build Tools, with a link. So, I followed the instructions
and installed the build tools, which required another system restart. I also used a different `pip install`
command from the pycocotools GitHub page where they say they made minor
changes to get it to work on Windows. I ran the `pip install requirements.txt`
command again just to double check, and this time it does say that we have
all the necessary modules installed. Now, to run inference, we can open the
console inside of the YOLOv5 folder, and run `python detect.py --source 1` command.
The integer is the index of the webcam to use. Since I have two webcams, I call `--source
1` to start inference using my second webcam. We can now see it working!! YOLOv5 uses OpenCV in the
background to display video, and we can click the "Q" key inside
the OpenCV window to stop the program. We can also run inference on an image
or on a video by passing a path to it instead of the webcam index.
Let's try it with this image. We can even pass a YouTube video link here.
However, please note that the first time I tried that it didn't work as it was saying that
some modules still weren't installed. I then went ahead also pip installed `pafy` and `youtube_dl`
modules, and then everything worked fine. One important thing to note here is that when
we start inference, by default it downloads the smallest YOLOv5 model. Add the weights
parameter to specify which model to use. Small, medium, large, or extra large
models. For example, to use the large model we pass `--weights yolov5l.pt` and the
weights will be downloaded automatically. The smaller models tend to be a little less
accurate but are faster. While the larger ones tend to require more computing power to
run, they usually do better with accuracy. It's definitely a tradeoff! It depends on
what hardware you have available for training, and where the model will be deployed: on a phone,
on your laptop, on a self-driving car, and so on. Now we know how to run inference. YOLOv5 is an
implementation of YOLO-type network in Python with PyTorch. This means that we can open any of
the Python files, including `detect.py` that we were just running, and be able to modify them and
fairly easily add them to our own Python projects. In comparison, the original YOLO networks were
programmed in C and used the darknet deep learning framework. It definitely had it's advantages in
speed - and, well, that fact we wouldn't have YOLO networks without it - but it did make it
rather difficult to integrate with Python code. That's it for this video! Stay tuned for the next one where we'll cover data
collection and labelling. If you have any questions or comments please feel
free leave them in the comment section down below, and I'll be happy to answer them! And
consider subscribing to see the upcoming parts of the series. Thank you for watching,
I hope you enjoyed it and found it useful!