Collect and Label Images to Train a YOLOv5 Object Detection Model in PyTorch | Part 2

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hey everyone, Ivan here! In this video we'll be talking about data collection and labelling. This video covers everything you need to know about preparing a dataset to train a custom YOLOv5 model. We'll actually start by asking if object detection is right for the problem you're working on. If it is, we'll move on to collecting training images, labelling tools, and, finally, using Weights & Biases to store a dataset in the cloud before downloading it when we start training on a virtual machine. --- You're watching Part 2 of the YOLOv5 series. Watch Part 0 to learn more about the YOLOv5 and Weights & Biases integration, and watch Part 1 to learn how to install YOLOv5 for real-time object detection on Windows and Google Colab. If you have any questions feel free to drop them in the comment section down below, and let's get started! --- Everything starts with determining if object detection is the right approach for your problem in the first place. You see, object detection is powerful but has its limitations. By definition, object detection is all about detecting whether a given object is present in the image and drawing a rectangular bounding box around it. Instead of just saying that an image has a dog in it, we say the image has a dog present at the following coordinates. So, if you're looking for a level of precision beyond a rectangular box, you may want to use other approaches. For example, image segmentation aims to assign a label to every pixel in the image. If you care about the orientation of an object in space, you may want to look into 3D object detection, since normal object detection gives you a flat, two-dimensional bounding box. You might find that your problem has multiple parts. In this case, you can combine object detection with other techniques. For example, you could break a 3D orientation estimation problem into smaller parts. In one of the projects my friend Carlo Lepelaars was a part of, they isolated a stuffed animal inside a bounding box, cropped the results, and then passed it through another model to estimate its rotation. One of my projects was building an app for visually impaired people to detect buses and identify their route number. Object detection was a perfect fit for part of this task because I cared about the location of the buses and the bus numbers, so that I could attribute the number to the bus it's on. However, while object detection is perfect for detecting bus numbers, it's not so perfect for reading them. It's not practical to have tens of thousands of classes for all the possible bus number combinations. So, I used object detection just for what it does best: detecting the bus numbers. Then I cropped the images and used another model to read the cropped numbers. --- If you've determined that object detection will be useful and relevant to the problem you're trying to solve, now we'll talk about the next step, which is data. Training an object detector is a supervised learning problem, meaning that we need examples and the correct answers to those examples. One very important question for almost all supervised machine learning problems is "How are you going to get the data?" You might already have a dataset in mind. In that case, then you might be interested in supplementing your existing dataset. The fundamental rule of ML still applies here: the more quality data, the better. If you don't have a dataset, you could create one! If you're creating your own dataset, you want to think about where the model will be deployed. For example, since I was working on an app that would be used at a bus stop to detect buses, I filmed buses approaching bus stops. I split those videos into frames that became part of my dataset. Another thing you can do is to look at public datasets and dataset platforms. COCO, or Common Objects in Context, is a very popular object detection and segmentation dataset with over 120k images and 80 different classes. Here's a link to my video on the YOLO format for COCO dataset. You can also look at Kaggle, Roboflow, and other platforms hosting public datasets. Or, try running a quick google search of "dataset type you're looking for + yolo format". A dataset in yolo format means that the images are already labeled, so you can save a lot of time by not having to label the images yourself. A fourth way of collecting data is using the free images on the Internet. There are many tools that can help automate the process of downloading these images. Just make sure that you're allowed to use the images you're downloading. --- I want to emphasize the important of having true negatives in your data. In the case of object detection, this means having images without the object you're looking for. For example, I added images with no buses into my dataset, so that the model could learn when not to detect buses. I also added images of trucks, because they might look similar to buses. That way the model learns what's NOT a bus, and we can avoid potential misdetections in the future. The easiest way for me to get many images of trucks also was COCO dataset. In my case, I ended up with 7000 bus images which allowed the model to produce good results. However, you don't necessarily need that many images to start training. A good number to start with can be just a thousand. YOLOv5 also has some data-augmentation built-in which could help with smaller datasets. If the results look promising, you can continue adding more images while tracking the accuracies of your models. We'll use Weights & Biases for that in part 3 of the series. --- Now that we have our images, how do we label them? Different object detectors have different label formats. YOLOv5 and many other YOLO-type networks use two files with the same name, but a different file extension. One file is a jpeg image file and the other is a .txt text file. The image is just a normal image - that's pretty simple. The .txt file is used to store the labels: the types of objects present in the image and the coordinates of their bounding boxes. --- The number of rows indicates the number of objects in the image. Each row has 5 parameters: the index of an object's class, the x and y coordinates of the bounding box center, and the width and height of the bounding box. The coordinates and bounding box dimensions are normalized to a value between zero and one as a percentage of the image dimensions. --- For example, I'll draw a very elongated bounding box at the bottom of the image. In the .txt file we see that the x and y coordinates of the bounding box center are about 0.5 and 0.93. These values are normalized to a percentage of the image dimensions, so the x coordinate is located about 50% across the image (left to right) and the y coordinate is 93% down the image (top to bottom). The width of the bounding box is 97% of the entire image width, and the height of the bounding box is 10% of the entire image height. Obviously we can't just go count every pixel and type out this information by hand! We're gonna need to run some code. Labelling software really comes in all different sizes and flavors. When I asked people on Twitter what they use to label images, I received the following recommendations. (plan to show on screen a list of names here: LabelImg,Labelme, semi-autonomous DarkLabel,Labelbox, SuperAnnotate, VIA from VGG Oxford that runs in the browser). My favorite labelling tool is an open-source repo called OpenLabelling by João Cartucho. It's powered by OpenCV, which I'm very familiar with, so it was easy to add extra features that made it perfect for my needs. I'll show the labelling process using a modified version of OpenLabelling, but feel free to use any labelling tool that supports the YOLOv5 format. I'll link to OpenLabelling and the modified version in the description. First thing we need to do, as usual, is download and unzip the labelling repo. We'll put the images we want to label into the images folder. According to the YOLO format, .txt files with the same filenames will be created in the bbox_txt folder. We need to make sure that we have these Python modules installed in order to run ModifiedOpenLabelling. We need to run the run.py file to launch the labelling tool. I'll open it as a Python file and then press F5 to run the code. A window that we can resize or make fullscreen will pop up and display the image. There's a sliding bar at the top to switch the image we're currently on and the class we're currently drawing a bounding box for. We can also use the "A" and "D" keys to switch between images and the "W" and "S" keys to switch between classes. The first left mouse click defines the top left corner of a bounding box, and the second click defines the bottom right corner. A right click inside a bounding box deletes it. All of the bounding boxes we draw get automatically added to the .txt file that corresponds to the image. We can specify the number of classes and their names in the class_list.txt file. In my case class closed_door has the index 0, opened_door 1, bus 2, and number 3 You want the classes to be specific enough for the model to be able to generalize well, but not so specific that there isn't enough data to support it. Remember: the class bus_number is better than bus_number_86 Now, I'll describe how I modified the OpenLabelling tool. First, I added the ability to switch classes with the digit keys on the keyboard. This comes in handy when there are many classes. For example, I press 1 to use switch to the class with index 0, and I press 2 to switch to the class with index 1, and so on. I also added the ability to move an image and its label directly to the local recycle bin inside of OpenLabelling by pressing the R key. It helps when you encounter a confusing image that you don't want to include in the dataset , so you can quickly remove it and continue labelling. Labelling is a pretty tedious process but it can be quite rewarding when it helps your model perform better. Now, let's learn about taking our labelled data and backing it up in the cloud. Then we can use it for training in the future. We want to use the data that we've just labelled to train in the cloud: on Google Colab or on whatever cloud platform that you prefer. For that we'll use Weights & Biases Artifacts, which allows us to version our datasets and models. You can watch Part 0 of the series to learn more about Artifacts and the YOLOv5 and Weights & Biases integration. First, I will run a script inside the ModifiedOpenLabelling folder in order to randomly split the data into the training and validation sets. We can specify the ratio of the split here, and run the script to copy the images into the new folders. The script creates a custom_dataset directory, and the images and labels for the training and validation sets are in the appropriate folders. Next up, we can copy the custom_dataset folder into the YOLOv5 folder, which we installed locally in the first part of this series. After that, we'll need a .yaml file for our dataset. YOLOv5 uses this file to understand where to take the images and labels for training, what the names of the classes are, how many classes there are, and so on. Let's make a copy of the voc.yaml file that came with the YOLOv5 repo, name it custom_dataset.yaml, and edit it a little bit. Let's delete all the lines we don't need, and change the paths , number of classes, and class names. The result should look something like this: Now, let's save it and upload our dataset as an Artifact. We'll open the console inside the YOLOv5 folder and run `pip install wandb` to install the Weights & Biases Python client. Next, we'll run this command to upload our dataset as Artifact into a W&B project named custom_yolov5. It may prompt you to log into Weights & Biases (or quickly create a new account) if it's your first using the wandb Python client. Notice that the custom_dataset_wandb.yaml file appears in the yolov5 folder. We’ll use this file in the future to start training on the dataset that we uploaded as a W&B Artifact. We can follow this link to open our W&B project. This data is now stored in the cloud. We can access it no matter which machine we plan on using for training! If we navigate into the Artifacts tab, we'll see two new dataset Artifacts appear: one for the training and one for the validation data. If we click on one of the Artifacts and go into the Files tab, we can open it as a Wandb Table! WandB Tables are a way of interactively exploring data. We can visualize, query, and analyze tabular data right in the browser! Weights & Biases Artifacts is a tool for model & dataset versioning, aka keeping track of the changes to your models and datasets. I will now add a random bounding box to this image just by editing the labels .txt file. Then I’ll upload the dataset as an Artifact again. W&B Artifacts scans every file and creates a new version of the dataset if it detects any change in a file’s name or contents. Here we can see the random bounding box. We can also compare different versions of datasets side-by-side and see the difference. You can see that the old version has three bounding boxes, and that the new version has four of them. Most importantly, W&B Artifacts only saves the new or changed files and just references the unchanged ones, meaning that we’re not wasting any storage. That's it for this video. Stay tuned for the next one where we'll cover training a YOLOv5 model while leveraging Weights & Biases experiment tracking tools to their full power! If you have any questions or comments please feel free leave them down below, and I'll be happy to answer them! And consider subscribing to our channel to see the upcoming parts of the series. Thank you for watching, I hope you enjoyed it and found it useful!

Info

Channel: Weights & Biases

Views: 35,038

Rating: undefined out of 5

Keywords:

Id: a9Bre0YJ8L8

Channel Id: undefined

Length: 17min 22sec (1042 seconds)

Published: Thu Jul 08 2021