YOLO V8 one of the most powerful computer
vision Technologies ever made in this video I'm going to show you how to do object
detection image classification image segmentation and pose detection on your own
custom data using yolo V8 I'm going to show you the entire process how to annotate your
data how to do the training how to analyze the performance of the model you trained and how
to use the model to make predictions this is a full comprehensive tutorial on
yolo V8 and now let's get started hey my name is Felipe and welcome to my channel
in this video we are going to train an object detector using yolo V8 and I'm going to walk you
step by step through the entire process from how to collect the data you need in order to train an
object detector how to annotate the data using a computer vision annotation tool how to structure
the data into the exact format you need in order to use yolo V8, how to do the training and
I'm going to show you two different ways to do it; from your local environment and also from
a Google collab and how to test the performance ofthea model you trained so this is going to be
a super comprehensive step-by-step guide of everything you need to know in order to train
an object detector using yolo v8 on your own custom data set so let's get started so let's
start with this tutorial let's start with this process and the first thing we need to do is to
collect data the data collection is the first step in this process remember that if you want to train
an object detector or any type of machine learning model you definitely need data, the algorithm, the
specific algorithm you're going to use in this case yolo V8 is very very important but the data
is as important as the algorithm if you don't have data you cannot train any machine learning model
that's very important so let me show you the data I am going to use in this process these are some
images I have downloaded and which I'm going to use in order
to train this object detector and let me show you a few of them these are some images of alpacas
this is an alpaca data set I have downloaded for today's tutorial and you can see these are all
images containing alpacas in different postures and in different situations right so this is
exactly the data I am going to use in this process but obviously you could use whatever data set you
want you could use exactly the same data set I am going to use or you can just collect the data
yourself you could just take your cell phone or your camera or whatever and you can just take the
pictures the photos the images you are going to use you can just do your own data collection
or something else you could do is to just use a a publicly available data set so let
me show you this data set this is the open image dataset version 7 and this is a dataset which is
publicly available and you can definitely use it in order to work on today's tutorial in order to
train the object detector we are going to train on todays tutorial so let me show you how it looks
like if I go to explore and I select detection uh you can see that I'm going to unselect all
these options you can see that this is a huge data set containing many many many many many
many many many categories I don't know how many but they are many this is a huge data set
it contains millions of images, hundreds of thousands if not millions of annotations thousands
of categories this is a super super huge data set and you can see that you have many many different
categories now we are looking at trumpet and you can see these are different images with trumpets
and from each one of these images we have a bounding box around the trumpet and if I show you
another one for example we also have Beetle and in this category you can see we have many different
images from many different type of beetles so this is another example or if I show you this one
which is bottle and we have many different images containing bottles for example there you can see
many different type of bottles and in all cases we have a bounding box around the bottle and I could
show you I don't know how many examples because there are many many many different categories
so remember the first step in this process is the data collection this is the data I am going
to to use in this project which is a dataset of alpacas and you can use the exact same data
I am using if you want to you can use the same data set of alpacas or you can just collect your
own data set by using your cell phone your camera or something like that or you can also download
the images from a publicly available dataset for example the open images dataset version 7. if you
decide to use open images dataset version 7 let me show you another category which is alpaca this
is exactly from where I have downloaded all of the images of alpacas so if in case you decide to use
this publicly available data set I can provide you with a couple of scripts I have used in order to
download all this data in order to parse through all the different annotations and to
format this data in the exact format we need in order to work on today's tutorial so in case
you decide to use open image data set I am going to give you a couple of scripts which are going to
be super super useful for you so that's that's all I can say about the data collection remember you
need to collect data if you want to train an object detector and you have all those different ways
to do it and all these different categories and all these different options so now let's move on
to the next step and now let's continue with the data annotation you have collected a lot of images
as I have over here you have a lot of images which you have collected yourself or maybe you have
downloaded this data from a publicly available data set and now it's the time to annotate this
data set maybe you were lucky enough when you were creating the dataset and maybe this data set you
are using is already annotated maybe you already have all the bounding boxes from all of your
objects from all your categories maybe that's the case so you don't really need to annotate your
data but in any other case for example if you were using a custom data set, a dataset you have collected
yourself with your own cell phone your camera and so on something you have collected in that case
you definitely need to annotate your data so in order to make this process more comprehensive in
order to show you like the entire process let me show you as well how to annotate data so we are
going to use this tool which is CVAT this is a labeling tool I have used it many many times in
many projects I would say it's one of my favorite tools I have used pretty much absolutely all
the object detection computer vision related annotation tools I have used maybe I haven't used
them all but I have used many many of them and if you are familiar with annotation tools you would
know that there are many many of them and none of them is perfect I will say all of the different
annotation tools have their advantages and their disadvantages and for some situations you prefer
to use one of them and for other situations it's better to use another one CVAT has many advantages
and it also has a few disadvantages I'm not saying it's perfect but nevertheless this is a tool I
have used in many projects and I really really like it so let me show you how to use it you
have to go to cvat.ai and then you select try for free there are different pricing options
but if you are going to work on your own or or in a very small team you
can definitely use the free version so I have already logged in this is already logged into my
account but if you don't have an account then you will have to create a new one so you
you're going to see like a sign up page and you can just create a new account and then you can
just logged in into that account so once you are logged into this annotation tool you need to
go to projects and then create a new one I'm going to create a project which is called alpaca
detector because this is the project I am going to be working in and I'm going to add a label
which in my case is going to be only one label which is alpaca and then that's pretty much all
submit and open I have created the project it has one label which is alpaca remember if your project
has many many different labels add all the labels you need, and then I will go here which is create
a new task I am going to create a new annotation task and I'm going to call this task something
like alpaca detector annotation task zero zero one this is from the project alpaca detector and this
will take all the labels from that project now you need to upload all the images you are going to
annotate so in my case I'm obviously not going to annotate all the images because you can see these
are too many images and it doesn't make any sense to annotate all these images in this video These
are 452 images so I'm not going to annotate them all but I'm going to select a few in order to show
you how exactly this annotation tool works and how exactly you can use it in your project also in my
case as I have already as I have downloaded these images from a publicly available data set from
the open images dataset version 7 I already have the annotations I already have all the
bounding boxes so in my case I don't really need to annotate this data because I already have the
annotations but I'm going to pretend I don't so I can just label a few images and I can show you
how it works so now I go back here and I'm just going to select something like this many images
right yeah I'm just going to select this many images I'm going to open these images and then
I'm going to click on submit and open right so this is going to create this task and at the same
time it's going to open this task so we can start working on our annotation process okay so this is
the task I have just created I'm going to click here in job number and this and the job number
and this will open all the images and now I'm going to start annotating all these images so we
are working on an object detection problem so we are going to annotate bounding boxes we need to
go here and for example if we will be detecting many different categories we would select what
is the category we are going to label now and and that's it in my case I'm going to label always the same
category which is alpaca so I don't really need to do anything here so I'm going to select shape
and let me show you how I do it I'm going to click in the upper left corner and then in the
bottom right corner so the idea is to enclose the object and only the object right the idea is to
draw a bonding box around the object you only want to enclose this object
and you can see that we have other animals in the back right we have other alpacas so I'm just going
to label them too and there is a shortcut which is pressing the letter N and you can just create
a new bounding box so that's another one this is another one this is another alpaca and this is
the last one okay that's pretty much all so once you're ready you can just press Ctrl s that's
going to save the annotations I recommend you to press Ctrl S as often as possible because it's
always a good practice so now everything is saved I can just continue to the next image now we are
going to annotate this alpaca and I'm going to do exactly the same process I can start here obviously
you can just start in whatever corner you want and I'm going to do something like this okay
this image is completely annotated I'm going to continue to the next image in this case I am going
to annotate this alpaca too. this is not a real alpaca but I want my object detector to be able
to detect these type of objects too so I'm going to annotate it as well this is going to be a very
good exercise because if you want to work as a machine learning engineer or as a computer
visual engineer annotating data is something you have to do very often, actually training
machine learning models is something you have to do very often so usually the data annotation is
done by other people, right, it is done by annotator s there are different
services you can hire in order to annotate data but in whatever case whatever service you use
it's always a very good practice to annotate some of the images yourself right because if
you annotate some of the images yourself you are going to be more familiar with the data
and you're also going to be more familiar on how to instruct the annotators on how to annotate this
particular data for example in this case it's not really challenging we just have to annotate these
two objects but let me show you there will be other cases because there will be always situations
which are a little confusing in this case it's not confusing either I have just to I have to label
that object but for example a few images ago when we were annotating this image if an annotator
is working on this image that person is going to ask you what do I do here should I annotate
this image or not right if an annotator is working on this image and the instructions you provide
are not clear enough the person is going to ask you hey what do I do here should I annotate
this image or not is this an alpaca or not so for example that situation, another situation will be
what happened here which we had many different alpacas in the background and some of them for
example this one is a little occluded so there could be an annotator someone who ask you hey do
you want me to annotate absolutely every single alpaca or maybe I can just draw a huge bonding box
here in the background and just say everything in the background is an alpaca it's something that
when an annotator is working on the images they are going to have many many different questions
regarding how to annotate the data and they are all perfect questions and very good questions
because this is exactly what's about I mean when you are annotating data you are defining exactly
what are the objects you are going to detect right so um what I'm going is that if you annotate some
of the images yourself you are going to be more familiar on what are all the different situations
and what exactly is going on with your data so you are more clear in exactly what are the objects
you want to detect right so let's continue this is only to show a few examples this is another
situation in my case I want to say that both of them are alpacas so I'm just going to say
something like this but there could be another person who says no this is only one annotation
is something like this right I'm just going to draw one bonding box enclosing both of them
something that and it will be a good criteria I mean it will be a criteria which I guess it would
be fine but uh whatever your criteria would be you need one right you need a criteria so while you
are annotating some of the images is that you are going to further understand what exactly is
an alpaca what exactly is the object you want to consider as alpaca so I'm just going to continue
this is another case which may not be clear but I'm just going to say this is an alpaca this
black one which we can only see this part and we don't really see the head but I'm going to
say it's an alpaca anyway this one too this one too this one too also this
is something that always happens to me when I am working when I am annotating images that I am more
aware of all the diversity of all these images for example this is a perfect perfect example because
we have an alpaca which is being reflected on a mirror and it's only like a very small
section of the alpaca it's only like a very small uh piece of the alpacas face so what
do we do here I am going to annotate this one too because yeah that's my criteria but another person
could say no this is not the object I want to detect this is only the object I want to detect and maybe
another person would say no this is not an alpaca alpacas don't really apply makeup on them this is
not real so I'm not going to annotate this image you get the idea right there could be many different
situations and the only way you get familiar with all the different type of situations
is if you annotate some of the images yourself so now let's continue in my case I'm going
to do something like this because yeah I would say the most important
object is this one and then other ones are like... yeah it's not really that important if we detect
them or not okay so let's continue this is very similar to another image I don't know how many I have
selected but I think we have only a few left I don't know if this type of animals are natural... I'm very surprised about this like the head right it's like it has a lot of
hair over here and then it's completely hairless the entire body I mean I don't know I'm
surprised maybe they are made like that or maybe it's like a natural alpaca who cares who cares...
let's continue so we have let's see how many we have only a few left so let's continue uh let's
see if we find any other strange situation which we have to Define if that's an alpaca or not so
I can show you an additional example also when you are annotating you could Define your bounding box
in many many different ways for example in this case we could Define it like this we could Define
it like this I mean we could Define it super super fit to the object something like this super super
fit and we could enclose exactly the object or we could be a little more relaxed right for example
something like this would be okay too and if we want to do it like this it will be okay too right you
don't have to be super super super accurate you could be like a little more relaxed and it's
going to work anyway uh now in this last one and that's pretty much all
and this is the last one okay I'm going to do something like this now I'm
going to take this I think this is also alpaca but anyway I'm just going to annotate this part
so that's pretty much all, I'm going to save and those are the few images I have selected in order
to show you how to use this annotation tool so that's pretty much all for the data annotation and
remember this is also a very important step this is a very important task in this process because
if we want to train an object detector we need data and we need annotated data so this is a very
very important part in this process remember this tools cvat this is only one of the many many
many available image annotation tools, you can definitely use another one if you want it's
perfectly fine it's not like you have to use this one, at all, you can use whatever annotation tool
you want but this is a tool I think it's very easy to use I like the fact it's very easy to use it's
also a web application so you don't really need to download anything to your computer you can
just go ahead and use it from the web that's also one of its advantages so yeah so this is a
tool I showed you in this video how to use in order to train this object detector so this is going
to be all for this step and now let's continue with the next part in this process and now that
we have collected and annotated all of our data now it comes the time to format this data to
structure this data into the format we need in order to train an object detector using yolo V8
when you're working in machine learning and you're training a machine learning model every single
algorithm you work with it's going to have its own requirements on how to input the data that's going
to happen with absolutely every single algorithm you will work with it's going to happen with yolo
with all the different YOLO versions and it's going to happen with absolutely every single
algorithm you are working with so especially yolov8 needs the data in a very specific format so
I created this step in this process so we can just take all the data we have generated all the
images and all the annotations and we can convert all these images into the format we need in order
to input this data into yolo V8 so let me show you exactly how we are going to do that if you
have annotated data using cvat you have to go to tasks and then you have to select this option and
it's export task data set it's going to ask you the export format so you can export this data into
many different formats and you're going to choose you're going to scroll all the way down and you're
going to choose YOLO 1.1 right then you can also save the images but in this case it's not really
needed we don't really need the images we already have the images and you're just going to click ok
now if you wait a few seconds or a few minutes if you have a very large data set you are going to
download a file like this and if I open this file you are going to see all these different files
right you can see we have four different files so actually three files and a directory and if I open
the directory this is what you are going to see which is many many different file names and if I
go back to the images directory you will see that all these images file names they all look pretty
much the same right you can see that the file name the structure for this file name looks pretty
much the same as the one with as the ones we have just downloaded from cvat so basically the way
it works is that when you are downloading this data into this format into the YOLO format every
single annotation file is going to be downloaded with the same name as the image you have annotated
but with a different extension so if you have an image which was called something.jpg then The
annotation file for that specific image will be something.txt right so that's the way it works
and if I open this image you are going to see something like this you're going to see in this
case only one row but let me show you another one which contains more than one annotation I
remember there were many for example this one which contains two different rows and each one of
these rows is a different object in my case as I only have alpacas in this data set each one of
these rows is a different alpaca and this is how you can make sense of this information the first
character is the class, the class you are detecting I wanted to enlarge the entire file and
I don't know what I'm doing there okay okay the first number is the class you are
detecting in in my case I only have one so it's only a zero because it's my only class and
then these four numbers which Define the bounding box right this is encoded in the YOLO format which
means that the first two numbers are the position of the center of the bounding box then you have
the width of your bounding box and then the height of your bounding box, you will notice
these are all float numbers and this basically means that it's relative to the entire size of
the image so these are the annotations we have downloaded and this is in the exact same format
we need in order to train this object detector so remember when I was downloading these
annotations we noticed there were many many many different options all of these different options
are different formats in which we could save the annotations and this is very important because you
definitely need to download YOLO because we are going to work with yolo and everything it's pretty
much ready as we need it in order to input into yolo V8 right if you select YOLO that's exactly
the same format you need in order to continue with the next steps and if you have your data into
a different format maybe if you have already collected and annotate your data and you have your
data in whatever other format please remember you will need to convert these images or actually to
convert these annotations into the YOLO format now this is one of the things we need for
the data this is one of the things we need in order to we need to format in order to structure
the data in a way we can use this object detector with yolo V8 but another thing we should do is
to create very specific directories containing this data right we are going to need two directories
one of them should be called images and the other one should be called labels you definitely need
to input these names you cannot choose whatever name you want you need to choose these two names
right the images should be located in an directory called images and the labels should be located in
a directory called labels that's the way yolo V8 works so you need to create these two directories
within your image directory is where you are going to have your images if I click here you can
see that these are all my images they are all within the images directory they are all within
the train directory which is within the images directory this directry is not absolutely needed
right you could perfectly take all your images all these images and you could just paste all your
images here right in the images directory and everything will be just fine but if you want you
could do something exactly as I did over here and you could have an additional directory which is
in between images and your images and you can call this whatever way you want this
is a very good strategy in case you want to have for example a train directory containing all the
training images and then another directory which could be called validation for example and this
is where you are going to have many images in order to validate your process your training
process your algorithm and you could do the same with an additional directory which could be
called test for example or you can just use these directories in order to label the data right
to create different versions of your data which is another thing which is very commonly done so you
could create many directories for many different purposes and that will be perfectly fine but you
could also just paste all the images here and that's also perfectly fine and you can see that
for the labels directory I did exactly the same we have a directory which is called train and within
this directory is that we have all these different files and for each one of these files let me
show you like this it's going to be much better for each one of these files for each one of
these txt files we will have an image in the images directory which is called exactly the
same exactly the same file name but a different extension right so in this case this one is called
.txt and this one is called .jpg but you can see that it's exactly exactly the same file name
for example the first image is called oa2ea8f and so on and that's exactly the same name as
for the first image in the images directory which is called oa2ea8f and so on so basically for
absolutely every image in your images directory you need to have an annotations file and a file in
the labels directory which is called exactly the same exactly the same but with a different extension
if your images are .jpg your annotations files are .txt so that's another thing which also
defines the structure you'll need for your data and that's pretty much all so remember you need
to have two directories one of them is called images, the other one is called labels within the images
directories is where you're going to have all your images and within your labels directories is where
you will have all your annotations, all your labels and for absolutely every single image in your
images directory you will need to have a file in the labels directory which is called exactly
the same but with a different extension if your images are .jpg your annotation files should
be .txt and the labels should be expressed in the yolo format which is as many rows as
objects in that image and every single one of these rows should have the same structure you
are going to have five terms the first one of them is the class ID in my case I only have one class
ID I'm only detecting alpacas so in my case this number will always be zero but if you're detecting
more than one class then you will have different numbers then you have the position the X and Y
position of the center of the bounding box and then you will have the width and then you will
have the height and everything will be expressed in relative coordinates so basically this is
the structure you need for your data and this is what this step is about so that's
pretty much all about converting the data or about formatting the data and now let's move on to the
training now it's where we are going to take all this data and we are going to train our object
detector using yolo V8 so now that we have taken the data into the format we need in order to
train yolo v8 now comes the time for the training now it comes the time where we are going to take
this custom data set and we are going to train an object detector using yolo V8 so this is yolo
V8 official repository one of the things I like the most about YOLO V8 is that in order
to train an object detector we can do it either with python with only a few python
instructions or we can also use a command line utility let me see if I find it over here we can
also execute a command like this in our terminal something that looks like this and that's pretty
much all we need to do in order to train this object detector that's something I really really
liked that's something I'm definitely going to use in our projects from now on because I think
it's a very very convenient and a very easy way to train an object detector or a machine learning
model so this is the first thing we should notice about yolo V8 there are two different ways
in which we can train an object detector we can either do it in python as we usually do or
we can run a command in our terminal I'm going to show you both ways so you're familiar with both
ways and also I mentioned that I am going to show you the entire process on a local environment in a
python project and I'm also going to show you this process in a google colab so I I know there are
people who prefer to work in a local environment I am one of those people and I know that there are
other people who prefer to work on a Google colab so depending on in which group are you I
am going to show you both ways to do it so you can just choose the one you like the most so let's
start with it and now let's go to pycharm this is a pycharm project I created for this training and
this is the file we are going to edit in order to train the object detector so the first thing I'm
going to do is to just copy a few lines I'm just going to copy everything and I'm going to remove
everything we don't need copy and paste so we want to build a new model from scratch so we are going
to keep this sentence and then we are going to train a model so we are just going to remove
everything but the first sentence and that's all right these are the two lines we need in order to
train an object detector using yolo V8 now we are going to do some adjustments, obviously the
first thing we need to do is to import ultralytics which is a library we need to use in
order to import yolo, in order to train a yolo V8 model and this is a python Library we need to
install as we usually do we go to our terminal and we do something like pip install and the library
name in my case nothing is going to happen because I have already installed this library but please
remember to install it and also please mind that when you are installing this Library this library
has many many dependencies so you are going to install many many many many different python
packages so it's going to take a lot of space so definitely please be ready for that because you
need a lot of available space in order to install this library and it's also going to take
some time because you are installing many many many different packages but anyway let's continue
please remember to install this library and these are the two sentences we need in order to run
this training from a python script so this sentence we're just going to leave it as
it is this is where we are loading the specific yolo V8 architecture the specific yolo V8 model we are going to use you can see that we can choose from any of all of these different
models these are different versions or these are different sizes for yolo V8 you can see we have
Nano small medium large or extra large we are using the Nano version which is the smallest one
or is the lightest one, so this is the one we are going to use, the yolo V8 Nano, the yolo V8 n then about
the training about this other sentence we need to edit this file right we need a yaml file which
is going to contain all the configuration for our training so I have created this file and I have
named this file config.yaml I'm not sure if this is the most appropriate name but anyway this is
the name I have chosen for this file so what I'm going to do is just edit this parameter and I'm
going to input config.yaml this is where the config.yaml is located this is where the main.pi
is located, they are in the same directory so if I do this it's going to work just fine and then let
me show you the structure for this config.yaml you can see that this is a very very very simple
configuration file we only have a few Keys which are PATH train val and then names right let's
start with the names let's start with this this is where you are going to set all your different
classes right you are training an object detector you are detecting many different categories many
different classes and this is where you are going to input is where you're going to type all of
those different classes in my case I'm just detecting alpacas that's the only class
I am detecting so I only have one class, is the number zero and it's called alpaca but if you are
detecting additional objects please remember to include all the list of all the objects you are
detecting, then about these three parameters these three arguments the path is the absolute path to
your directory containing images and annotations and please remember to include the absolute path.
I ran some issues when I was trying to specify a relative path relative from this directory from
my current directory where this project is created to the directory where my data is located when
I was using a relative path I had some issues and then I noticed that there were other people
having issues as well I noticed that in the GitHub repository from YOLO V8 I noticed this is in the
the issues section there were other people having issues when they were specifying a relative path
so the way I fixed it and it's a very good way to fix it it's a very easy way to fix it it's just
specifying an absolute path remember this should be an absolute path so this is the path to this
directory to the directory contain the images and the labels directories so this is this is the
path you need to specify here and then you have to specify the relative path from this location to
where your images are located like the specific images are located right in my case they are in
images/train relative to this path if I show you this location which is my root directory then if
I go to images/train this is where my images are located right so that's exactly what I need to
specify and then you can see that this is the train data this is the data the algorithm is going
to use as training data and then we have another keyword which is val right the validation dataset
in this case we are going to specify the same data as we used for training and the reason
I'm doing this is because we want to keep things simple in this tutorial I'm just going to show
you the entire process of how to train an object detector using yolo V8 on a custom data set
I want to keep things simple so I'm just going to use the same data so that's pretty much all
for this configuration file now going back to main that's pretty much all we need in order to
train an object detector using yolo V8 from python that's how simple it is so now I'm
going to execute this file I'm going to change the number of epochs I'm going to do this for only
one Epoch because the only thing I'm going to show you for now is how it is executed, I'm going to
show you the entire process and once we notice how everything is working once we know
everything is up and running everything is working fine we can just continue but let's just
do this process let's just do this training for only one Epoch so we can continue you can see that
now it's loading the data it has already loaded the data you can make use of all the different
information of this debugging information we can see here you can see now
we were loading 452 images and we were able to load all the images right 452 from 452 and if
I scroll down you can see that we have additional information additional values which are related
to the training process this is how the training process is going right we are training this object
detector and this additional information which we are given through this process so for now the
only thing we have to do is only waiting we have to wait until this process is completed so
I am going to stop this video now and I'm going to fast forward this video until the end of this
training and let's see what happens okay so the training is now completed and you can see that
we have an output which says results saved to runs/detect/train39 so if I go to that directory
runs/detect and train39 you can see that we have many many different files and these files are related to how the training process was done right for example if I show you these
images these are a few batches of images which were used in order to train this algorithm
you can see the name is train batch0 and train batch1 I think we have a train batch2 so we have a lot of different images of a lot of different alpacas of different images we used
for training and they were all put together they were all concatenated into these huge images so
we can see exactly the images which were used for training and The annotation on top of them right
the bonding boxes on top of them and we also have similar images but for the validation dataset
right remember in this case we are using the same data as validation as we use for training so it's
exactly the same data it's not different data but these were the labels in the validation data set
which is the training data set and these were the predictions on the same images right you can see
that we are not detecting anything we don't have absolutely any prediction we don't have absolutely
any bounding box this is because we are doing a very shallow training we are doing a very dummy
training we are training this algorithm only for one epoch this was only an example to show you the output
how it looks like to show you the entire process but it is not a real training but nevertheless
these are some files I'm going to show you better when we are in the next step
for now let me show you how the training is done from the command line from the terminal using the
command I showed you over here using a command like this and also let me show you how this training
is done on a Google colab so going to the terminal if we type something like this yolo detect train
data I have to specify the configuration file which is config.yaml and then model yolov8n.yaml
and then the number of epochs this it's exactly the same as we did here exactly the
same is going to produce exactly the same output I'm just going to change the number of epochs for
one so we make it exactly the same and let's see what happens you can see that it we have exactly
the same output we have loaded all the images and now we are starting a new training process and
after this training process we are going to have a new run which we have already created the new
directory which is train40 and this is where we are going to save all the information related
to this training process so I'm not going to do it because it's going to be exactly the same as
as the one we did before but this is exactly how you should use the command line or how you
can use this utility in order to do this training from the terminal you can see how simple it is
it's amazing how simple it is it's just amazing and now let me show you how everything is done
from a Google colab so now let's go back to the browser so I can show you this notebook I created
in order to train yolo V8 from a Google colab if you're not familiar with Google collab the way
you can create a new notebook is going to Google Drive you can click new more and you select
the option Google collaboratory this is going to create a new google colab notebook and you
can just use that notebook to train this object detector now let me show you this notebook and
you can see that it contains only one two three four five cells this is how simple this will
be the first thing you need to do is to upload the data you are going to use in order to train
this detector it's going to be exactly the same data as we used before so these are exactly
the same directories the images directory and the label directory we used before and then
the first thing we need to do is to execute this cell which mounts Google Drive into
this instance of google collab so the only thing I'm doing is just I just pressed
enter into this cell and this may take some time but it's basically the only thing it does is
to connect to Google Drive so we can just access the data we have in Google Drive so I'm going to
select my account and then allow and that's pretty much all then it all comes to where you have the
data in your Google drive right in the specific directory where you have uploaded the data in
my case my data is located in this path right this is my home in Google Drive and then this
is the relative path to the location of where I have the data and where I have all the files
related to this project so remember to specify this root directory as the directory where you have
uploaded your data and that's pretty much all and then I'm just going to execute this cell
so I save this variable I'm going to execute this other cell which is pip install ultralytics the
same command I ran from the terminal in my local environment now I'm going to run it in Google
collab remember you have to start this command by the exclamation mark which means you are running
a command in the terminal where this process is being executed or where this notebook is being
launched so remember to include the exclamation mark everything seems to be okay everything
seems to be ready and now we can continue to the next cell which is this one you can see that
we have done exactly the same structure we have input exactly the same lines as in our
local environment if I show you this again you can see we have imported ultralytics then we have
defined this yolo object and then we have called model.train and this is exactly the same as we are
doing here obviously we are going to need another yaml file we are going to need a yaml file in our
Google Drive and this is the file I have specified which it's like exactly the same
configuration as in the um as in the in the yaml file I showed you in my local environment is
exactly the same idea so this is exactly what you should do now you should specify an absolute
path to your Google Drive directory that's the only difference so that's the only difference
and I see I have a very small mistake because I see I have data here and here I have just
uploaded images and labels in the directory but they are not within another rectory which
is called Data so let me do something I'm going to create a new directory which is called Data
images labels I'm just going to put everything here right so everything is consistent so now
everything is okay images then train and then the images are within this directory so everything
is okay now let's go back to the Google collab every time you make an edit or every time you do
something on Google Drive it's always a good idea to restart your runtime so that's what I'm going
to do I'm going to execute the commands again I don't really need to pip install this Library
again because it's already installed into this environment and then I'm going to execute this
file I think I have to do an additional edit which is uh this file now it's called google_colab_config.yaml and that's pretty much all I'm just going to run it for one Epoch so everything is exactly
the same as we did in our local environment and now let's see what happens so you can see that
we are doing exactly the same process everything looks pretty much the same as it did before we
are loading the data we are just loading the models everything it's going fine and
this is going to be pretty much the same process as before you can see that now it takes
some additional time to load the data because now you have... you are running this environment you're
running this notebook in a given environment and you're taking the data from your Google Drive so
it takes some time it's it's a slower process but it's definitely the same idea so the only thing we
need to do now is just to wait until all this uh process to be completed and that's pretty much all
I think it doesn't really make any sense to wait because it's like it's going to be exactly the
same process we did from our local environment at the end of this execution we are going to have
all the results in a given directory which is the directory of the notebook which is running this
process so at the end of this process please remember to execute this command which is going
to take all the files you have defined in this runs directory which contains all the runs you
have made all the results you have produced and it's going to take all this directory
into the directory you have chosen for your files and your data and your google collab and so on
please remember to do this because otherwise you would not be able to access this data and
this data which contains all the results and everything you have just trained so this is how
you can train an object detector using yolo V8 in a Google collab and you can
see that the process is very straightforward and it's pretty much exactly the same process exactly
the same idea as we did you in our local environment and that's it so that's how easy it is to train
an object detector using yolo Y8 once you have done everything we did with the data right once
you have collected the data you have annotated data you have taken everything into the format
yolo V8 needs in order to train an object detector once everything is completed then
running this process running this training is super straightforward so that's going to be
all about this training process and now let's continue with the testing now let's see how these
models we have trained how they performed right let's move to the next step and this is the last
step in this process this is where we are going to take the model we produced in the training
step and we're going to test how it performs this is the last step in this process this is how
we are going to complete this training of an object detector using yolo v8, so once we have trained
a model we go to the uh to this directory remember to the directory I showed you before regarding... the
directory where all the information was saved where all the information regarding this training
process was saved and obviously I I'm not going to show you the training we just did because it was
like a very shallow training like a very dummy training but instead I'm going to show you the
results from another training I did when I Was preparing this video where I conducted exactly the
same process but the training process was done for 100 epochs so it was like a more deeper training
right so let me show you all the files we have produced so you know what are all the different
tools you have in order to test the performance of the model you have trained so basically you have
a confusion Matrix which is going to give you a lot of information regarding how the different
classes are predicted or how all the different classes are confused right if you are familiar
with how a confusion Matrix looks like or it should look like then you will know how to read
this information basically this is going to give you information regarding how all the different
classes were confused in my case I only have one class which is alpaca but you can see that
this generates another category which is like uh the default category which is background and we
have some information here it doesn't really say much it says how these classes are confused but
given that this is an object detector I think the most valuable information it's in other metrics in
other outputs so we are not really going to mind this confusion Matrix then you have some plots
some curves for example this is the F1 confidence curve we are not going to mind this plot either
remember we are just starting to train an object detector using yolo V8 the idea for this
tutorial is to make it like a very introductory training a very introductory process so we are not going to
mind in all these different uh plots we have over here because it involves a lot of knowledge and
a lot of expertise to extract all the information from these plots and it's not really the idea for
this tutorial let's do things differently let's focus on this plot which is also available in
the results which were saved into this directory and you can see that we have many many many
different plots you can definitely go crazy analyzing all the information you have here
because you have one two three four five ten different plots you could knock yourself out
analyzing and just extracting all the information from all these different plots but again the idea
is to make it a very introductory video and a very introductory tutorial so long story short I'm
just going to give you one tip of something the one thing you should focus on these plots for now
if you're going to take something from this video from how to test the performance of a model
you have just trained using yolo v8 to train an object detector is this make sure your loss is going
down right you have many plots some of them are related to the loss function which are this one this
one and this one this is for the training set and these are related to the validation set make
sure all of your losses are going down right this is like a very I would say a very simple way to
analyze these functions or to analyze these plots but that's... I will say that that's more powerful
that it would appear make sure all your losses are going down because given the loss function we
could have many different situations we could have a loss function which is going down which
I would say it's a very good situation we could have a loss function which started to go down and
then just it looks something like a flat line and if we are in something that looks like a flat line
it means that our training process has stuck so it could be a good thing because maybe the the
algorithm the machine learning model really learned everything he had to learn about this
data so maybe a flat line is not really a bad thing maybe I don't know you you would have to
analyze other stuff or if you look at your loss function you could also have a situation
where your loss function is going up right that's the other situation and if you my friend
have a loss function which is going up then you have a huge problem then something is obviously
not right with your training and that's why I'm saying that analyzing your loss function what
happens with your loss is going to give you a lot of information ideally it should go down if
it's going down then everything is going well most likely, if its something like a flatline
well it could be a good thing or a bad thing I don't know we could be in different situations
but if it's going up you have done something super super wrong I don't know what's going on
in your code I don't know what's going on in your training process but something is obviously
wrong right so that's like a very simple and a very naive way to analyze all this information
but trust me that's going to give you a lot a lot of information you know or to start working
on this testing the performance of this model but I would say that looking at the plots and analyzing
all this information and so on I would say that's more about research, that's what people
who do research like to do and I'm more like a freelancer I don't really do research so
I'm going to show you another way to analyze this performance, the model we have just
trained which from my perspective it's a more... it makes more sense to analyze it like this and it
involves to see how it performs with real data right how it performs with data you have
used in order to make your inferences and to see what happens so the first step in this more
practical more visual evaluation of this model of how this model performs is looking at these images
and remember that before when we looked at these images we had this one which was regarding the
labels in the validation set and then this other one which were the predictions were completely
empty now you can see that the the predictions we have produced they are not completely empty
and we are detecting the position of our alpacas super super accurately we have some mistakes
actually for example here we are detecting a person as an alpaca here we are detecting also
a person as an alpaca and we have some missdetections for example this should be in alpaca and it's not
being detected so we have some missdetections but you can see that the the results are pretty much okay
right everything looks pretty much okay the same about here if we go here we are detecting pretty
much everything we have a Missdetection here we have an error over here because we are detecting
an alpaca where there is actually nothing so things are not perfect but everything seems to be pretty much
okay that's the first way in which we are going to analyze the performance of this model which is
a lot because this is like a very visual way to see how it performs we are not looking at plots we
are not looking at metrics right we are looking at real examples and to see how this model performs
on real data maybe I am biased to analyze things like this because I'm a freelancer and the way it
usually works when you are a freelancer is that if you are building this model to deliver this
project for a client and you tell your client oh yeah the model was perfect take a look at all
these plots take a look at all these metrics everything was just amazing and then your client
tests the model and it doesn't work the client will not care about all the pretty plots and so
on right so that's why I don't really mind a lot about these plots maybe I am biased because I am a
freelancer and that's how freelancing works but I prefer to do like a more visual evaluation
so that's the first step we will do and we can notice already we are having a better
performance we are having an okay performance but this data we are currently looking at right
now remember the validation data it was pretty much the same data we use as training so this
doesn't really say much I'm going to show you how it performs on data which the algorithm have
never seen with completely and absolutely unseen data and this is a very good practice if you
want to test the performance of a model, so I have prepared a few videos so let me show you these
videos they are basically... remember this is completely unseen data and this is the first video
you can see that this is an alpaca which is just being an alpaca which is just walking around
it's doing its alpaca stuff it's having an alpaca everyday life it's just being an alpaca
right it's walking around from one place to the other doing uh doing nothing no it's doing
its alpaca stuff which is a lot this is one of the videos I have prepared this is another video
which is also an alpaca doing alpaca related stuff um so this is another video we are going to
see remember this is completely unseen data and I also have another video over here so I'm
going to show you how the model performs on these three videos I have made a script in Python
which loads these videos and just calls the predict method from yolo v8, we
are loading the model we have trained and we are applying all the predictions to this model and
we are seeing how it performs on these videos so this is the first video I showed you and these
are the detections we are getting you can see we are getting an absolutely perfect detection
remember this is completely unseen data and we are getting I'm not going to say 100 perfect detection
because we're not but I would say it's pretty good I will say it's pretty pretty good in order to
start working on this training process uh yeah I would say it's pretty good so this is one of
the examples then let me show you another example which is this one and this is the other video
I showed you and you can see that we are also detecting exactly the position of the alpaca
in some cases the text is going outside of the frame because we don't really have space but
everything seems to be okay in this video too so we are taking exactly the position of this uh
alpaca the bonding box in some cases is not really fit to the alpaca face but yeah but everything
seems to be working fine and then the other video I showed you you can see in this case the detection
is a little broken we have many missdetections but now everything is much better and yeah in
this case it's working better too it's working well I would say in these three examples this one
it's the one that's performing better and then the other one I really like how it performed too in
this case where the alpaca was like starting its alpaca Journey... we have like a very
good detection and a very stable detection then it like breaks a little but nevertheless I would say
it's okay it's also detecting this alpaca over here so uh I will say it's working pretty much
okay so this is pretty much how we are going to do the testing in this phase remember that if you
want to test the performance of the model you have just trained using yellow V8 you will have a lot
of information in this directory which is created when you are yolo the model at the end of your
training process you will have all of these files and you will have a lot of information to knock
yourself out to go crazy analyzing all these different plots and so on or you can just keep it
simple and just take a look at what happened with the training loss and the validation
loss and so on all the loss functions make sure they are going down that's the very least thing
you need to make sure of and then you can just see how it performs with a few images or with
a few videos, take a look how it performs with unseen data and you can make decisions from
there maybe you can just use the model as it is or you can just decide to train it again in this
case if I analyze all this information I see that the loss functions are going down and not
only they are going down but I notice that there is a lot of space to to improve this training, to
improve the performance because we haven't reached that moment where everything just appears to be
stuck right like that a flat line we are very far away from there so that's something I would do
I would do a new deeper training so we can just continue learning about this process also I w
change the validation data for something that's completely different from the training
data so we have even more information and that's pretty much what I would do in order to iterate in
order to make a better model and a more powerful model hey my name is Felipe and welcome to my
channel in this video I'm going to show you how to make an image classifier using Yolo
V8 on your own custom data I'm going to show you every single step of this process from how
to organize the data so it complies with Yolo V8 how to do the training in your local computer
and also from a Google Colab how to validate the performance of the model you trained and finally
how to take the image classifier in order to make new predictions I'm going to show you the
entire process this is going to be an amazing tutorial and now let's get started so on today's
tutorial I'm going to show you how to train an image classifier using yolo V8 on your own
custom data set so let's get started and the first thing I'm going to do is to show you the
data I am going to use in this tutorial which is a weather related dataset let me show you the
different categories we have and let me show you all the different images how they look like
we have four different categories and they are cloudy, rain, shine and sunrise now let me show you
each one of these categories for example the cloudy category this is how the images look like you
can see that in each one of these images we have a sky which is completely cloudy right we have many
different clouds for each one of these images now the sunrise category it's basically many different
pictures of sunrises so this is how this category look like and now for the shine category we have a
sky which is completely completely clear and with a super super bright sun right you have the sun
in each one of these images and it's super super bright and this is the rainy category and you can
see these are many different pictures of super rainy days so this is basically the dara set I am
going to use in this tutorial but obviously you can apply absolutely everything I'm going to show
you today to absolutely any type of data set you are going to be able to build any type of image
classifier with everything I'm going to say in this tutorial now let me show you the structure
you need for your data because if you're going to train an image specifier or if you're going
to use yolo V8 yes the data is super super important but you also need to structure to give
like a format to all of your data so it complies with the way yolo V8 expects your data to be
right yolo V8 requires your data to be in a given format in a given structure so I'm going
to show you exactly how to structure your file system so everything looks the way it shloud to train
an image classifier using yolo V8 so if I show you I have a directory which is called weather
data set this is going to be the root directory you can call this directory whatever you want but
you need a directory which is going to be your root directory and inside this directory you can
see we have two different folders one of them is called train and the other one is called val and
this is exactly where you are going to have your training dataset and your validation dataset right
it's very important you name these directories exactly like this one of them should be called
train and the other one val now if I show you within the train directory this is where we are
going to have our four directories containing all the different images for all of our categories
basically you need to have as many directories as categories you want to classify with your model so
in my case I want to classify an image into four different categories and this is why I have four
different directories each one of these directories is named as the category
I want to classify my images in right when one of them is called cloudy the ther one
is called rain then shine and then sunrise and these are the categories I want to classify all my
images and then within these directories, these folders is where I have all my data within cloudy
is where I have all my data related to the Cloudy category and so on right the same happens for
the rain and the shine and the sunrise category so this is basically the structure you need for
your data the structure you need for your file system in order to comply with what yolo v8
is expecting for your data and then if I go to the val folder you can see I have exactly the
same structure I have four different directories and they are named under the categories I want
to classify all my images and then if I open this directory it's exactly the same you can
see that I only have different images for that specific category now this is very important
because from now on everything is going to be super super straightforward if you have created
this structure for your file system if your data is exactly in the structure I show you there
is going to be super simple to train an image classifier in yolov8 so this is very very
very important now I'm going to show you three different ways in which you can train an image
classifier using yolo V8 so let's start with the first way which is using a python script we are
going to make a very very simple script in Python in order to train this model and let me show
you how to do it so let's go to pycharm this is a pycharm project I created for todays tutorial and the first thing you should do
if you want to work with yolo V8 is to install a couple of dependencies a couple of python packages
these are two packages we are going to use in this tutorial one of them is ultralytics and the
other one is numpy, ultralytics is very very very very super important because this is exactly the
library you need in order to import yolo, in order to train this model using yolo V8 so you
definitely need these two packages, now in order to install these packages this is how we are going to
do it I'm going to show you a way to install these packages which is going to work with whatever your
OS right if you are a Linux user or if you are a Windows user if you use Mac it doesn't matter it's
going to work anyway so you need to go to file then settings and then you have to select python
interpreter right this is the python interpreter we are going to use you can see that I'm using
python 3.8 and then you need to click on plus and this is where you're going to find... you're
going to search for the packages you want to install in my case I'm going to search for
ultralytics and the version I'm going to use let me copy the version first it's this one so I'm
just going to file setting then Ultralytics again and then the version is this one okay and then I
click on install package in my case I have already installed this dependency so nothing is going
to happen on my computer but please remember to do it on your computer because otherwise you
will not be able to use or you're not going to be able to do anything of what we are going
to be doing today now let's see numpy what's exactly the version we are going to use we're
going to use 1.24.2 so file settings Plus numpy 1.4 24.2 so everything is okay now install package
this is like uh everything it's okay numpy has been installed successfully so now we are ready
to continue once we you have installed these two dependencies these two packages now you're ready
to continue and now you're ready to install your own image classifier using yolo V8 so let's
go to main this is the file we are going to use in order to code everything we need in order
to train this classifier and let me show you exactly what's the code you need to type in order
to do this training I'm going over here to the GitHub repository of yolo V8 and I'm going to
select the classification section right I'm going for the classification and then here I'm going
to click on classification docs this is going to open a new file a new URL a new website a new
page and this is exactly the all the information we need in order to train this image classifier
I'm just going scroll down and I'm going to the train section and this is what we're going to do I'm
going to copy and paste this line which is the one in the middle the one that says load a pre-trained
model recommended for training I'm just going to copy and then I'm going back to Pycharm and I'm
just going to paste it obviously we need to import yolo otherwise this is not going to work
so I'm going to say from ultralytics import YOLO and that's pretty much all you can see now we
are creating our model, we are creating the object we are going to use as our model and
then I'm just going to copy and paste this last line which is model.train I'm going to paste it
here and then I'm going to make a few edits I'm going to leave this value I'm going to leave
the image size in 64 but then for the number of epochs I'm going to set it in 1 right because
the first thing we're going to do is we're going to do a very very dummy training in order to make
sure everything works as expected in order to make sure everything works properly and once we are
completely and 100% sure everything is okay we are going to move forward with a more deeper training
and with a more real training right but for now let's just do the training for one Epoch and let's
see how it goes then for data this is where you're going to specify the absolute part to the data
you are going to train this model with right in my case it's going to be this weather dataset so I'm
just going to copy and paste the absolute path of this data set which is this... I'm going to copy this
path and I'm going to paste it here right this is the data I am going to use remember that you need to
specify the absolute path to the root directory of your data and remember you need to structure
your data into the exact format that I already mentioned right otherwise this is not going to
work and that's everything we need in order to train this image classifier so the only thing I'm
going to do is to press play I'm going to run this script so let's see what happens remember we are
running this training we are doing this process for only one Epoch because we need to make sure
everything works properly and once everything is working properly we are just going to edit
this value we're going to make this training for more epochs but you can see everything seems
to be working properly so everything seems to be okay and everything seems to be completed and
everything seems to be ready so that's it and you can see that the results have been saved here
in run/classify/train12 so let me show you exactly where this directory where this location
is in my file system if I go to the project the pycharm project I created into my file system
this is exactly the project I created this is the file we are currently working in the main.py file
and this is where my data is located and this is where the runs directory the runs folder will be
located this is where it will be created you can see that within runs we have another directory
which is called classify and here is where you will have many many many folders for each one of
your training processes and you can see that in my case I have trained this classifier many many
many different times while I Was preparing this video so there are many directories for me but
this is exactly the one which was just created the train12 right train12 this is exactly
the directory which was just created and if I open this directory you can see we have another
directory and then we have two files I'm going to explain what exactly all these different files and
all these different folders are and exactly what's the information we have in all these files but
I'm going to do it later on this tutorial when we are validating this training process right
for now just remember all the results will be saved here will be saved within
this folder within the runs folder and then within classify and then a new directory a new folder
will be created for the training process you have just executed right this is something you need to
remember for now but later on this tutorial I'm going to show you exactly how you can validate the
training using the information that's within this directory but for now let's continue I'm going
to show you now a different way in which you can train this image classifier using yolo V8 I'm
going to do... I'm going to show you how to do it using the command line using this utility and this
is actually like a very very straightforward way to do this training let me show you you can see that
we have three different examples I'm just going to select this one I'm going to copy and paste this
instruction this line and I'm going to show you how it I'm just going to paste it here and you can
see that we have many different parameters right in the first word is yolo this is the utility
we are going to execute then classify this is the task we are going to execute we are going to
train an image classifier and then we are going to train it so we need... we have another keyword
which is train and then we have these arguments data model and epochs and also image size I'm
going to do exactly the same with image size I'm just going to leave this value in 64 but then
I'm going to edit all the other values so actually I'm going to the number of epochs and I'm
also going to edit data for the number of epochs let's do something similar I'm just going to do
it for one epoch so we make sure everything runs smoothly and everything runs properly and then
we can do like a more serious training a more real training for more epochs this is exactly
the model I'm going to use so I'm not going to edit this keyword either and then I'm going to
edit this argument and I'm just going to say this is the absolute path to my data so this is
going to be exactly the same as I have over here something like this okay and that's pretty much
all the only thing I need to do now I'm going to copy and paste this sentence and I'm just going
to a terminal and I'm going to do something like this right I'm just copying I have just copy and
paste that sentence and you can see that that's all we need to do in order to train this image
classifiers using yolov8 you can see that the model... the training process has started and everything
is running super super smoothly so everything is going super super well that's all right that's a
very very quick way and a very straightforward way to do this training you can see the training has
just been completed and this is exactly where the results have been saved to runs/classify/train13,
so everything is completed everything is ready you can see how simple how fast is to train an
image classifier just by running this command now I'm going to show you another way to do this
training which is using a google colab we are going to use a Jupiter notebook we are going to use a
notebook in a Google collab in order to train this model and this is also like a very good way to do
it so let me show you how to do it so basically you need to go to google drive you need to go to
your Google Drive you need to select new then more google collaboratory and this is going to
open a new notebook this is going to open a new notebook in Google Colab and
is exactly what you need to do in order to use this notebook to train yolo v8
now I'm going to show you a notebook I have already created in order to train
this model which is this one is called train.ipymb and obviusly I'm going to give you
exactly this notebook in the GitHub repository of today's video of today's tutorial so you can
just use this notebook if you want now I'm going to show you all these different cells everything
that's already writen on this notebook so you can... so you understand how exactly to use it
and how it works and what exactly you are doing at each step so let's start with the first step
another thing you need to do if you want to train this image classifier is to upload all
the data with all the images and with all your categories into Google Drive obviously for
example in my case this is where I have my weather data set you can see that this directory
is exactly this same directory I have over here weather data set within weather data set there
are two directories which are train and val if I if I open this directory you can see we
also have traiin and val so this is exactly exactly the same data as in my local computer now
this is something very important because remember to do it because you need the data in your Google
Drive in order to train this model using a Google collab this is a very very important step please
remember to upload your data into Google Drive now once your data is in Google Drive then you
need to be able to access your data from the Google collab and in order to do that you need
to execute this cell if I click enter you can see that now I'm going to be asked if I want
to connect Google collab with Google Drive and the only thing I need to do is to say I accept there
you can see that it's requesting for my permission I say connect to Google Drive and then
I select my account and then basically is to scroll down to the bottom of this page and to
click allow and it's going to allow Google collab to access all the data you have in your Google
Drive so this is a very very very important step now something that's very important is that
you need to be able to access your data so you need to know where your data is located in the Google
drive right you need to know exactly what's the path what's the location of your data in Google Drive
in my case let me show you my Google Drive you can see that my data is located into a directory
this is my root directory which is my drive then I have another directory which is called computer
vision engineer then another directory which is image classification yolo V8 and then data and
then this is where my weather data set is located in your case it's going to be different obviously
it depends on where exactly you have uploaded your data so something you may want to do is just to
click this... ls you can say something like ls and then you say something like content my Gdrive
my drive right you execute this command and if I execute this command you're going to see a very
very long list of files which are basically all the files which are in my root directory in
Google Drive and for example this is where I have the directory which is called computer
video engineer and if I do ls you're going to see all these different directories if I say
something like image classification yolo V8 then this is data train.ipymb which is
exactly this notebook and then if I say data this is exactly where the weather data set is
located right so do something like that because you definitely need to know what is the path of
your data in Google collab right you definitely need to do it in order to continue to The
Next Step this is very important because if you haven't set your data properly if your data
location is not set properly then yolo V8 will not be able to train your model this is very very
important so in my case this is exactly where the data the weather data set is located right this is
the path to the weather dataset so this is the the cell I am going to execute and this is the value I'm
going to save in this value in data dir now I'm going to continue then we need to pip install
ultralytics which is the library we need in order to train this model in order to use yolo V8 now
the only thing you need to do is to execute this cell and everything will run super smoothly you
can see that we have already completed this process now I'm going to continue and the only
thing we need to do now is to execute this cell and you can see that the code we have in this
cell is very very similar to the code we have over here right basically we are running a python
script from a Google collab that's all we're doing so you can see we are importing OS and also we
are importing the YOLO Library we are importing from ultralytics we're importing yolo and
then we are doing exactly the same as we are doing before and this is where we are using the
data directory the data dir variable we have defined over here right so this is why it's
very very important you set this variable properly so the only thing I'm going to do... I'm going to
do exactly the same as before I'm just going to do this training for only one Epoch so we make
sure everything's okay I'm going to press enter and that should be it in order to do all
this training the first time you execute this training it may take a little longer because
you are downloading all the weights and you're downloading the models and everything but uh after
that everything should be much much quicker okay so you can see that now the training process is
in progress everything is going super super well and from now on the only thing we need to do
is to edit the number of epochs so we do like a more deeper training but I will say everything is
working super super properly so now let's move to the other cells so I show you what exactly you
need to do once everything is completed once everything is completed the only thing you need to
do is to run this cell so you are copying you are going to copy all your results which were saved
on this directory you're going to copy everything on your Google drive right because remember you
are working on a Google colab you're working on an environment which is your Google collab
environment if you don't do something like this it's going to be super super hard for you to get
the data you have just trained right to get your results to get your model your weights is going to
be super super hard because everything is located in your Google collab environment and long story
short is going to be much much simpler and much much better if you just do something like this
and you just copy everything all the results which were saved in this directory into your Google
Drive it's going to be much much better because it's going to be much easier to download the weights
to download the results and so on so now I'm just going to wait a couple of minutes so everything is
completed over here and then I can show you how to copy the results into your Google Drive okay now
the training process has been completed and you can see that the results have been saved into runs
classify train so this has a very similar output to the one we just noticed when we were training on
our local environment now the only thing we need to do is to copy everything into our Google Drive
so everything is much much simpler if you want to download these results or to do whatever we
want so the only thing I'm going to do is to run this cell and everything will be copied into this
directory which is the same directory where I have my data and where I have my my Google collab right
now you can see that everything has been copied already this is the directory I have just copied
this is the time this is the current time so this is the result of the cell I have just executed and
if I go to runs classify train you can see that these are all the results we have generated this
is the CSV file containing many different results which I'm going to show you in a few minutes
and these are the weights and so on so from now on if we want to get this data or if we want to
analyze this data the only thing we need to do is to select runs and then we just need to click
download and it is going to download all this directory into your local drive right you can see
everything is being zipping and once everything is zipped this directory will be downloaded
into my local computer and you can see that this directory has just been downloaded so everything
is working just fine now this is pretty much all in order to show you three different ways in which
you can train an image classifier using yolo V8 and now let's do the deeper training right I'm
just going to take this script and I'm going to edit the number of epochs so we do this training
for something like 20 epochs I have already been doing some tests and 20 epochs is just enough for
this dataset for the data set I am using in this tutorial so 20 will be just fine now the only
thing we need to do is to click on run I'm just going to run this script as it is and everything
will be exactly the same as before everything will be exactly the same right we are just we just need
to wait until this process is completely we don't need to do anything from now on but this process
will be executed for 20 epochs so the only thing I'm going to do is to wait until this process is
completed and once everything is completed we are going to validate this training process I'm going
to show you how to analyze if all this process was done successfully or not if you have successfully
trained a good image classifier or not so I'm just going to pause the recording here and I'm
going to fast forward until this is completed okay so the training process has been completed
and now let me show you all the results which were saved here into runs classify and train14
now let me show you this directory this folder in my local computer if I go to runs
classify and then train14 this is where all the results have been saved and this is everything
we are going to analyze now, now we are going to decide if the model we have trained is a good
model or not we are going to decide if this is a model we can use or not so you can see that
there are two files args.yaml and results.csv and another directory called weights let's start
by args.yaml if I open this file you can see that this is something like a config file and
this is exactly the entire configuration file which we have just used in order to train this
model this is very important because this is a super super comprehensive list of all the hyper
parameters we have used in order to train this model and for example the only parameters we have
specified are image size number of epochs and then data the location of the data we have just used
and you can see that we have a keyword which is data then epochs then image size and then we
have many many many other keywords as well this is very important because these are
absolutely all the keywords we have used we have used all these default values which were
set for all these different keywords and this is important in case we want to train a new model
and we want to make some changes into some of these hyper parameters now let me show you the
other file which is the results.csv file I would say this is much more important this is like the
file containing all the information we need in order to decide if this is a good model or not
and you can see that we have many different rows each row for one of our training epochs right we
have trained this model for 20 epochs and you can see that we have 20 rows for each one of these
epochs and for each one of these rows we have all this different information and we are going to
focus on these three values on the training lose the accuracy, this is the accuracy of the
validation set and then also the validation loss right these are the three keywords in which
we are going to focus on this tutorial in order to validate this model and I'm going to give you
like a very very quick tip like a very quick way in order to analyze this training process which
is make sure the training loss and the validation loss are going down through this training process
and also make sure the accuracy goes up and I know you're thinking hey this is a very simple way
to analyze this process felpe yeah I agree with you this is a very simple way but at the same
time it's very robust this is like a a very simple but at the same time very powerful way to
decide if you have a good model or not now we can analyze all these numbers but I think it's going
to be much much better and it's going to be much much prettier if we make a plot with all these
numbers right because we have epochs in this um in this column in this coordinate and
we also have all these different values and we can definitely plot these values across all
these different epochs so let me show you a python file I have created and this is exactly what
this python file does this file is called plot_metrics and if I open this file you can see that
it basically we need to set the path to our results.csv file in our case I'm going to set it to
train14 and you can see this is run/classify/train14 and thenresults.csv and then this is only
like some logic some very simple logic to take all the data from these results.csv file and to do
some plots with it right that's all we are doing we're just taking the data and doing some plots
and this file will be available in the GitHub repository of this project of this tutorial so you
can definitely take this file and you can just use it to to plot your functions as well all I'm going
to do now is just press play and you can see that if we wait only a few seconds we get all these
two plots right and this is all the information in our CSV file right everything I showed you
over here it's summarized on these two plots so this is exactly what I mean with make sure your
loss is going down this is your loss in the training set and in the validation
set in the training set we are plotting the loss in blue and in the validation set is red and you
can see that in both cases the loss is going down right which is exactly what we expect it's exactly
what we want now this is a very very simple way to analyze this process but trust me this is also a
very powerful way right this is something that's very very healthy something that looks like
this it's very healthy and then for this other plot which is how the validation accuracy evolves
through this training process you can see that the evaluation accuracy goes up when we increase the
number of epochs right you can see that starting from the 10th Epoch or so everything starts to be
like somehow iddle right we are not really gaining a lot of accuracy from here but we are not losing
accuracy either right we are just in something like a plateau and this this is exactly how a
validation accuracy plot should look like right we are starting from a very low value and then we
are just increasing our accuracy until we reach a very high value of accuracy right this is like
a very healthy training process now obviously we could make this process even better if we just
tune if we just change some of these parameters and if we do like a more customized training I'm
sure we are... we will be able to have a better model right because remember we are using all the default
values so as it usually goes if we make like a more customized training and
we try different parameters and so on we should be able to get like a better model but obviously
we're not going to do it in this tutorial because I just wanted to show you like the end-to-end of
how to train this image classifier but remember you could do it even better than this if you make
like a more custom model so this is pretty much all for analyzing these plots which are the
validation accuracy and the loss function in order to validate your training and then it's like
this directory which is the weights directory you can see that this directory is called weights
and this is exactly where the models will be saved this is very important because you have trained
a model and now obviously you want this model in order to use it in your images in your data and
this is exactly where you are going to find this model and you can see that you have two different
files one of them is called last.pt another one is called best.pt now let me explain exactly what
these two files are and exactly what they mean so remember how this training process works right
remember that you have a model you have a deep learning model which is comprised of many many
many different weights and the way it goes is that at the end of every Epoch right at the end of
the first Epoch of the second epoch of the third epoch and so on you are updating the weights
of your model you are updating the weights of your architecture of your deep learning model so
the way it works is that at the end of every Epoch you have a model available which is a model
you have trained so far with all the process you have followed so far so last.pt means that
you are taking the model which was the result of the last Epoch of your training process right
remember at the end of absolutely every single Epoch you have a model available which you can
definitely use if you want to in order to produce your inferences and so on, so last.pt only
means that you are taking the last Model the model which was produced at the end of your training
process at the end of the last Epoch in your training process so at the end of the 20th Epoch
in our training process we are producing this model Which is last.pt but you may Wonder hey Felipe
yeah it's great because at the end of our training process our accuracy is something
like a 93% right a 93% it's a very good accuracy but if we take the accuracy
if we take the model at the end of the 16th Epoch for example our accuracy it's higher it's a
94.9 % maybe it makes more sense to take that model instead right because we have an even
better accuracy we have an even higher accuracy and if you ask me something like that I would say
yeah you're perfectly right you're you're super super right that's a very valid argument and
that's exactly what the best.pt model is right we are saving the weights of the best model in
our entire training process so if we look at our data the best model in our training process is
this one if I'm not mistaken right it's the model we produced at the end of the 16th Epoch and our
accuracy our validation accuracy was 94.9% so this is definitely higher than the accuracy
we got at the end of this training process which has which was a 93.5 and if we will take
the best model we have produced in the entire process in the entire training process then
we will definitely need to take this model so this is exactly what best.pt represents is the
best training the best model you have trained in your training process and if you ask me what I
usually do is take in the model which was produced at the end of the training process right what
I usually do is take in the last,pt file because I consider that if this is a model we
have produced at the end of the training process in this model we are summarizing much more
information right because we are considering much more data we are considering much more
everything in all this training process many things are going on many many things are going
on and remember there's a lot of Randomness in this training process so I, me, personally
I consider that if I take the model which was trained at the end of this process is a much
better option that if I choose a another one if I choose like the best model or the model which
got the highest accuracy but it's not the last Model that's what I usually do I usually take
the last model which was produced at the end of the training process but if you want to take the
best model if you want to take best.pt it also makes sense because you are taking the model
which produced the highest accuracy right so you can do either one of them and I think it's
a very a good option that's why you have these two files because you can use one of them or you
can use the other one and I would say that making like a very very like the best decision on which
mode to use depends on many different variables depends on many different things depends on your
data depends on your problem depends on your use case depends on your training process
depends on many many different things which is the best option right so remember you have these
two models and it's all up to you it's all up to your specific project and it's all up to your
preferences which model you want to use right if the best model which you have produced through
the entire training process or if you want to use the last Model the model which you have produced
at the end of your training process so now let's go back to pycharm because now it's time to make
our inferences now it's time to predict new samples right and we are going to input an image
and we're going to use our image classifier in order to predict which category this image belongs to so
let me show you how to do it I'm going to import from ultralytics import YOLO and then
let's go back to this page because now we are going to move to the predict section and
the only thing I'm going to do is to copy this sentence... going to paste it here and then
I'm going to specify the path the absolute path to the model which we have trained right
we don't really need to make it like the absolute path we can use the relative path
so I'm going to do something like this right sorry something like this so this is the path to
the model we have just trained right this is the last model which we produce at the end of this
training process and this is the model I'm going to use in order to show you how this works and
now let's copy this additional sentence which is results = model and the model path the image
path right you can see that you can use an image in your local computer in your file system or you
can also use something like an URL for example in this case in this example which is in the
yolo V8 website you can see that the example is using an URL and this is also going to work so
in my case I'm going to use an image in my local computer I'm going to use one of the images I used
for training because I only want to show you how this works but obviously you can use whatever
data whatever image you want so this is the image I am going to use I'm just going to use I'm
just going to inference this image right which is the first image in my Sunrise category data so
this is going to be something like sunrise1.jpg and this is pretty much all so these are
the results the first thing I'm going to do is just trying to run this code and let's see
what happens everything should run smoothly but this is where we are going to see if we have an
error or something like that we may need to wait a couple seconds and everything seems to be working
fine because we didn't get an error so what I'm going to do now is I'm going to print results
because I want to show you a couple of things so this is the entire information we are getting
when we are printing results right you can see that this is a lot of information we have these
probabilities which is the inferences we are making this is exactly the result of applying
our image classifier and then we have a lot of information another object or another result
which is very important is this one which are the names of the categories we have just trained
our image classifier on right you can see this is cloudy rain shine sunrise and also you can see
that we have different integer values for each one of these categories so this is something like a
dictionary because we are going to have a result from applying our image classifier and then
with this result which is going to be an integer we are going to call this dictionary we're
going to call this object because we want to know exactly what's the name of the category we
have just inferenced right so this is how we're going to do it I'm going to call another variable
which is going to be names something like names dictionary names_dict and this is results
zero because results is a list in this case we only want to access the first element because
we are only predicting an individual image so this is the element we want and then we are
going to call Dot names and that's pretty much all then I'm going to Define another variable
which is props and this is results 0 dot props and this is the probability Vector of all the
different categories we are trying to classify right so we are going to have a length 4 array
with the probabilities of the different classes we are classifying right so let me show you how
props looks like I'm going to print props and I'm going to do something else I'm going to say to
list so we make this object into a list we are using yolo which is based on pytorch so if we
don't do this if we don't call this method we will be working with a torch object right with a tensor
so we don't really want to do that so that's why I'm doing this tolist now I'm going to print
props so I show you how it looks like and I'll show you how to continue from here okay you can
see that this is a result we got from applying from printing props and you can see that this is a list
with four elements one two three and four and each one of these elements are the probabilities
of this image to be one of these categories right let's print the names too so we have all
the information in our screen I want to show you I want to show you something so I'm going to
print sorry this wasn't names this was names dict and now let's wait a couple of seconds I want
to show you not only the probabilities but also the class names so it's a little more clear what
exactly I'm going to show you now so this means that this number is the probability for this
image to be cloudy right this other number is the probability for this image to be rain this
other number is probability to be shine and then this last number is the probability to be
sunrise and you can see by the values that we are definitely classifying this image as Sunrise
right because this is almost a one this is almost like a super super confident and absolutely
confident classification so this is exactly the category we are classifying for this image and
this is how to make sense of this information so what I'm going to do now is to print names dicts
and then I'm going to call np dot arg max and then I'm going to input the probability list
I just showed you and obviously I need to import numpy as np otherwise is not going to work and
basically what we are doing here is that we are looking at this list the one containing all
four probabilities we are taking a look at the maximum number which in this case is this one and
we are taking the index of this maximum number so in this case this is the first element so this is
the index 0 this is one this is two and this is three right so from this um from calling np dot
arg max props we are getting three and then we are calling the third element of the names_dicts
object so we go here and we see that 3 belongs to the sunrise category and if we look at this image
again we are going to see we are in fact plotting a sunrise let me show you so everything seems to
be working fine and this is going to be all for today this is exactly how you can train an image
classifier using yolo V8 in your own custom data and this is going to be all for this tutorial so in my previous videos I showed you how
to train an image classifier and an object detector using yolo V8 now is the time for
semantic segmentation I'm going to show you the entire process of how to train a semantic
segmentation algorithm using yolo V8 from how to annotate the data how to train
the model in your local environment and also from a google colab and finally a super super
comprehensive guide on how to validate the model you trained my name is Felipe welcome
to my channel and now let's get started so let's start with tpday's tutorial and the first
thing I'm going to do is to show you the data we are going to be using today so this is a
dataset I have prepared for today's tutorial and you can see that these are images of ducks we are
going to be using a duck dataset today and this is exactly how the images look like now for each one
of our images for absolutely every single one of our images we are going to have a binary mask we
are going to have an image a binary image where absolutely every single Pixel is either white or
black and absolutely every single white pixel it's the location of our objects all the white pixels
are the location of the objects we are interested in in this case the objects are our Ducks so let
me show you an example so it's a little more clear what I mean regarding the white pixels are
the location of our objects so this is a random image in my data set this a random image of a duck
and this exactly its binary mask so take a look what happens when I align the these two images
and when I apply something like a transparency you can see that the binary mask is giving us the
exact location of the duck in this image so this is exactly what it means that the white pixels are
the location of our objects so this is exactly the data I am going to using in this tutorial and now
let me show you from where I have downloaded this data set this a dataset I have found in the
open images dataset version 7. let me show you this dataset super super quickly this is an
amazing dataset that you can use for many different computer vision related tasks for example if I
go to segmentation you can see that we have many many many different categories now we are looking
at a random category of phones this is for example a semantic segmentation data set of phones and
let me show you if I go here and I scroll down you can see that one of the categories is here
duck so this is for example the exact same data I am going to be using in this tutorial this is
the exact same duck dataset I am going to be using in order to train a semantic segmentation
algorithm using yolo V8 and obviously you could download the exact same data I am going to use in
this tutorial if you go to open images dataset version 7 you can just download the exact same
duck dataset I am going to be using today or you can also download another dataset of other categories
so this is about the data I am going to use in this project and this is about where you can
download the exact same data if you want to, now let me show you a website you can use in order
to annotate your data because in my case I have downloaded a dataset which is already annotated
so I don't have to annotate absolutely any of my images absolutely all of my images already have
its binary masks right I already have the masks for absolutely all the images in my data set but
but if you're building your data set from scratch chances are you will need to annotate your images
so let me give you this tool which is going to give you which is going to be super super useful
in case you need to annotate your data it's called cvat and you can find it in cvat.ai and this is
a very very popular computer vision annotation tool I have used it I don't know how many times
in my projects and it's very very popular and it's very useful so I'm going to show you how to
use this tool in order to annotate your images so the first thing we need to do is to go to
start using cvat this is going to ask you to either register if you don't have an user already
or to login right I already have an user so this is logged into my account and now let me show
you how I'm going to do in order to annotate a few images actually I'm going to annotate only one
image because I am only going to show you how to use it in order to create a binary mask for your
project but I'm just going to do it with only one image because it's yeah you only need to see the
process and that's going to be all so I'm going to projects I'm going to here to the plus button
create a new project the name of this project will be duck semantic sem seg this will be the name
of my project and it will contain only one label which is duck so I'm going to press continue
and that's pretty much all submit and open now I'm going to create a task this is already
yeah create new task the task name will be duck task zero one it doesn't really matter the name so
I just I just selected a random name then I'm going to add an image I'm just going to select
this image I'm just going to annotate one image so this is going to be enough and submit and open
so this is going to take a couple of seconds this is where you need to select all of your images
all the images you want to annotate but in my case I'm only going to select one so I'm going to
press here in Job so this is going to open The annotation job right now I'm going to show you how
you can annotate this image how you can create a binary mask for this image you need to go here to
draw new polygon then shape so I'm going to start over here and this is pretty much all we need to
do in order to create this semantic segmentation data set for this image right in order to create
the binary mask for this image you can see that I'm just trying to follow the Contour of this
object and you may notice that the Contour I am following is not perfect obviously this is
not perfect and it doesn't have to be perfect if you're creating a dataset if you are creating
the mask of an image if you are creating the mask of an object then it definitely doesn't need
to be pixel wise perfect right you need to make a good mask obviously but something like this as
I am doing right now will be more than enough so this is a very time consuming process you
can see and this is why I have selected only one because if I do many many images it's going
to take me a lot of time and it doesn't make any sense because the idea is only for you to
see how to annotate the images right so you can see that I'm following the Contour okay and this
is an interesting part because we have reached the duck's hand or its leg or something like that
this part of the duck's body and you can see that this is beneath the water this is below the
water and this is where you're going to ask yourself do you need to annotate this part or not
do you need to annotate this part as if it's part of the duck or not because you could say yeah
it's definitely part of this duck but you are not really seeing a lot of this object right it's like
part of the water as well so this is where you're going to ask yourself if you need to annotate this
part or not and in my case I'm going to annotate it but it's like you can do either way in all of
those sections in all of those parts where you are not 100% convinced then that's like a discussion
you could do it you could not do it it's up to you so annotating a few images is always a good
practice because you are going to see many many different situations as I have just seen over
here right where I have just seen with this part of the duck which now I am super super curious
what's the name if you know what's the name of this part of the duck's body please let
me know in the comments below I think it's called hand right because it's something like
a hand they have over there but let me know if it has another name and you if you know
it please let me know in the comments below now let's continue you can see I'm almost there
I have almost completed the mask of this duck now I only have to complete this
peak or whatever it's called it seems I don't really know much about ducks
Anatomy I don't really know what is the name of this part either so anyway I have already completed
and once I am completed I have to press shift N and that's going to be all so this is the mask this
is a binary mask I have generated for this object for this duck and this is going to be pretty
much all what I have to do now is to click save you can see that this is this is definitely not
a perfect mask this is not a perfect like pixel wise perfect mask because there are some parts
of this duck which are not within the mask but it doesn't matter make it as perfect as possible but
if it's not 100% perfect it's not the end of the world nothing happens so I have already saved this
image and what I need to do now is to download this data so I can show you how to download the
data you have just annotated in order to create your data set so this is what I'm going to do
I'm going to select this part this option over here and I'm going to export task data set and
then I'm going to select this option which is segmentation mask 1.1 I'm just going to
select that option and I'm going to click ok so that's going to be all we only
need to wait a couple of minutes and that's pretty much all the data has been
downloaded now I'm going to open this file and basically the images you are interested in
are going to be here right you can see in my case I only have one image but this is where
you're going to have many many many images and please mind the color you will get all these
images in right in my case I have a download this image in red it doesn't really matter just mind
that you could have something different than white but once you have all your images what
you need to do is to create a directory I'm going to show you how I do it I am going maybe
here and I'm going to create a very temporal directory which I'm going to call tmp and this
is where I'm going to locate this image right and I am going to I'm going to create two
directories one of them is going to be masks and then the other one is going to be called
labels and you're going to see why in only a minute and this is where I'm going to locate
the mask here and then I am going to pycharm because I have created a script a python script
which is going to take care of a very very very important process we have created masks which are
images which are binary images and that's perfect because that's exactly the information we need in
order to train a semantic segmentation algorithm but the way yolo V8 works we need to convert
this image this binary image into a different type of file we are going to keep exactly the
same information but we are going to convert this image into another type of file so let
me show you how this is a phyton file I have created in order to take care of this process
and the only thing you need to do is to edit these fields this is where you're going to put
all the masks this is a directory which is going to contain all the masks you have generated
and this is going to be the output directory you can see that these two variables are already
named properly in my case because this is the tmp directory I have just created this is where
I have located the mask I have just generated with cvat and this is my output directory
so take a look what happens when I press play so the script has just been executed everything
is okay this is the mask I have input and this is the file which was generated from this
mask and this looks super super super absolutely crazy right it's a lot of numbers it's like
a very very crazy thing without going to the details let's just say that this is exactly the
same information we had here this is exactly exactly the same information we have here but in
a different format let's let's keep the idea right exactly the same information in a different format
and that's exactly the format yolo V8 needs in order to train the semantic segmentation
model so this is exactly what you need to do once you have created all of your masks you need
to download these files into your computer and then please execute this script so you can
convert your images into a different type of files and obviously this script will be available in
the GitHub repository of today's tutorial so that's pretty much all in order to create
your annotations in order to download these annotations and in order to format everything the
way you should now let me show you the structure you need to format the way you need to structure
all of your file system so it complies with yolov8 remember this is something we have already
done in our previous tutorials regarding yolov8 once you have your data you need to structure
your data you need to format your data you need to structure your file system so yolo V8 finds
absolutely where everything is located right you're going to locate your images in a given
directory you're going to locate your annotations your labels in another directory so everything
is just the way yolov8 expects it to be right so let me show you I have a directory
which is my root directory which is called Data within data I have three directories but this
directory the Masks directory is not really needed it's just there because that's the way I
got my masks my data but it's not really needed in order to show you this directory which is the
one containing all of my binary masks in order to be more clear that this is not needed for this
part of this process what I'm going to do is I'm going to delete this directory right now it's
gone okay now we only have two directories and these are exactly the directories we need in this
part of this process where we are creating all the structure for our data so images you can see
that we have two directories one of them is called images the other one is called labels within
images we have two other directories one of them is called train and the other one is called val and
train is the directory where we are going to have all of our training data this is where we are
going to have all of our training images these are all the images yolo V8 is going to use in order
to train the model in order to train the semantic segmentation model then val also contains images
and these are the images we are going to use in order to validate the model right so remember you
need to have two directories one of them should be called train is very important the name it should
be called train and the other one should be called val now going back you can see that we have two
directories one of them is images the other one is labels and if I go within labels you can see
that there are two directories also they are named train and val and if I open these directories
these are the type of files I have generated with the exact same script I showed you a few minutes
ago so within labels we have two directories train and val and train are all the annotations
we have generated from the training data from the training masks right and long story short we have
our root directory within the root directory have two directories one of them is called images
the other one is called labels within images we have two directories train and val within
train and within val it's all of our data all of our images and within labels it's exactly the
same structure two directories train and val and within train and within val it's where we locate
all of our annotations right that's exactly the structure you need for your data please remember
to structure your file system like this otherwise you may have an issue when you are trying to train
a semantic segmentation model using yolo V8 so that's pretty much all in order how to
structure the data and now let's move to the interesting part let's move to the most fun
part which is training this semantic segmentation model now let's move to pycharm and I will show
you how to train it from your local environment so let's continue this is a pycharm project I
created for today's tutorial please remember to install this project requirements otherwise you
will not be able to use yolo V8 now let's go to train.py this is a python script I created and
this is where we are going to do all the coding we need in order to train the semantic segmentation
model using yolo V8 and now let's go back to the yoloV8 official repository because let's see
how exactly we can use this YOLO this model in order to train this semantic segmentation model
I'm going to the segmentation section and I'm going to click on segmentation Docs now this
is going to be very very straightforward I'm going to train I'm going to copy this sentence
which is load a pre-trained model and then going back to pycharm I'm just going to copy paste and
then I am going to from ultralytics import YOLO then I'm also going to copy this sentence
which is a model.train I'm going to change the number of epochs of 2 something like one
because remember it's always very very healthy it's always a very good idea to do like a very
dummy training to train the model for only one Epoch to make sure everything is okay to make
sure everything runs smoothly and then you do like a more more deeper training so I'm going
to change the number of epochs and then I'm also going to change the config file I'm going
to use this config file which is a config file I have over here and obviously you will find this
config file in the repository of today's video so long story short you can see that you
have many many different keywords but the only one that you need to edit is this one right this
is the absolute path to your data in my case if I copy and paste this path over here you can see
that this is the directory which contains the images and the labels directories so long
story short just remember to edit this path to the path to the location of your data because
if you have already structured everything in the way I mentioned in the way I show you in
this video then everything else will be just fine right the train and the val keywords are
very good as it is I mean you can just leave everything as it is but please remember to edit
this field which is the location of your data now going back to train.py this is pretty much
all we need to do in order to train the semantic segmentation model so I'm just going to press
play and let's see what happens and you can see everything is going well we are training our model
but everything it's taken forever everything is just going to take forever even though we are only
training this model for only one Epoch everything is going to take a lot of time so what I'm going
to do instead is just press stop I'm going to stop this training everything is going well I'm not
stopping this training because I had an error or something no everything is going well but I am
going to repeat the exactly the same process from a Jupiter notebook in my Google collab because
if I use Google collab I'm going to have access to a free GPU and it's going to make the process
much much much much faster so I am going to use a google colab in order to train this model and
I recommend you to use a Google collab as well so I'm going to show you how to do it from
your Google collab environment please remember to upload your data before doing anything in
Google colab please remember to upload your data otherwise it's not going to work for example
here you can see I have many directories one of these directories is data and within data you
have labels and images and these are exactly the same directories I have over here so I have
already uploaded my data into my Google Drive please remember to do it too otherwise you will
not be able to do everything we're going to do just now right so that's one of the things you
need to upload and then also remember to upload this config.yaml file the same file I showed you
in my local computer you also need this file here the only thing you will need to edit is this
path because now you need to specify the path the location of your data in Google Drive so I'm
going to show you exactly how to locate your data into your Google Drive and now let's move to the
Jupiter notebook obviously I'm going to give you this notebook this is going to be in the GitHub
repository of today's tutorial so you can just use this notebook I'm just going to show you how
to execute absolutely every single cell and how everything works right and exactly how everything
exactly what everything means right exactly what are you doing absolutely every single cell so the
first thing I'm doing is just connecting my Google collab environment we google drive because
remember we need to access data from Google Drive so we definitely need to allow google collab to
access Google Drive so I'm just going to select my account and then I scroll all the way down and
press allow that's going to be pretty much all we need to wait a couple of seconds and now let's
continue what I'm going to do now is to Define this variable which is data dir and this is
the location of my data in my Google Drive now please mind this path this location because please
mind the way this is structured right please mind the first word is content then gdrive then my
drive and then is my relative path to my data so if you want to know exactly where you have upload
your data if you're not completely sure where you have uploaded your data what you can do is to do
an ls like I'm doing right now and it's going to give you all the files you have in the root
directory of your Google Drive then from there just navigate until the directory where you have
uploaded your data in my case is my drive computer vision engineer image segmentation yolo V8 and then
data that's exactly where my data is located in Google Drive if I go to this directory you can see
that this is my drive then you can see that this is my drive then computer vision engineer image
segmentation yolo V8 and then data and this is exactly what I have over here so once you have
located your data the only thing you need to do is to edit this cell and to press enter so everything
is ready now I'm going to install ultralytics so I can use yolo V8 from The Notebook and this
is going to take a few seconds but this is going to be ready in no time, something you need to
do from your Google colab is to go to runtime and change runtime type just make sure it
says GPU just make sure you are using Google collab with GPU because if you are not using a
google collab with GPU everything is pretty much pointless right so just remember to check if
you are using a Google colab with GPU or not just do it before you start all this process because
otherwise you will need to run absolutely everything again so let's continue I have already
installed ultralytics and now I am going to run this cell and if you realize this is exactly
exactly the same type of information the same code I have over here in my local environment right I'm
just defining a model and then I am just training this model so what I need to do now is just
press enter and also mind that I have specified the config file right the location of my config
file and now I'm going to run a full training or actually I'm going to run a training for 10 epochs
so this is what I'm going to do and this is also going to take some time although we are going to
use a GPU this is going to take a few minutes as well so what I'm going to do now is just I'm going
to wait until this is completed and I'm going to pause my recording here and I'm just going to fast
forward this video until this process is completed okay so the training process is now completed we
have trained this model and everything is just fine and you can see the results have been saved
here under runs segment and train2. so the only thing we need to do now is to get the results we
got from this training we need to get the weights we need to get all the results all the different
metrics or the different plots because what we need to do now is to analyze this training process
we need to validate that everything is just fine right so what we are going to do now is to get
all this information and the easiest way to do it is just running this command what we will do
when running this command we are going to copy all the content in the this directory
where the results have been saved under our Google Drive right remember to edit this URL, remember
to edit this path, this location, because you want to copy everything into a directory into
your Google Drive so just make sure everything is okay make sure this location makes sense and you
can just execute this cell and you're going to copy everything into your Google Drive now let me
show you my Google Drive I have already executed this cell so everything is under my Google Drive
this is the runs directory which was created when I ran that cell and under this other directory which is
called segment we have train2 so these are all of our results these are the results we are now
going to analyze so what I'm going to do now is just to download this directory and once we have
this directory into our local computer then we are going to take a look at all the plots at all the
metrics and I'm going to tell you exactly what I usually do in order to validate this training
process so everything is now downloaded everything is now completed and let's take a look at these
files so what I'm going to do is I'm just going to copy everything into my desktop I need to do some
cleaning by the way so these are all the results we got from this training process you can see that
this is a lot of information this is definitely a lot of information right we have many many
different files we have many different everything we have the weights over here we have a lot of
information so let me tell you let me give you my recommendation about how to do this evaluation
how to do this validation from all these plots and from all of these results I would recommend you
to focus on two things one of them is this plot one of them is all of these metrics
and then I'm also going to show you how to take a look at these results at these predictions
from these images but for now let's start here you can see that this is a lot of information
these are a lot of metrics and you can definitely knock yourself out analyzing all the
information you have here you can definitely go crazy analyzing all of this information all
of these plots but I'm going to show you like a very very simple and a very straightforward way
to do this analysis to do this validation this is something that I have already mentioned in my
previous videos on yolo V8 on how to train a model and how to validate this model which is take
a look what happens with the loss function take a look what happens with your loss plots with all
the plots which are related to the loss function and as this is a semantic segmentation type
of algorithm I would tell you take a look what happens with this loss with the segmentation loss I would say take a look what happens
with the training loss and the validation loss and long story short just make sure the loss
function goes down right if your loss function is going down it's likely things are going well it's
not a guarantee maybe things are not really going that well and the model it doesn't really perform
that well it may happen but I would say that if the loss function is going down it's a very good sign
if at the contrary your loss function is going up I would say you have a very very serious problem
I would say there is something which is seriously wrong with your training process or with your
data or with your annotations or with something you have done something seriously wrong or there's
something seriously wrong with your data but I'm talking about something amazingly wrong seriously
wrong right if your loss solution is going up I don't know what's going on but something is
going on do you see what I mean so having a loss function which is going down yeah it's not
a guarantee of success I mean it's not like it's a good model for sure no you may have a
situation where you haven't trained a good model and your loss function is going down anyway but I
would say that it's a very very good sign at the very least your training loss and your validation
loss should go down and I'm talking about a trend of going down right for example here we have a
few epochs in which the loss function is going up that's okay that's not a problem we are looking
for a trend we should have a trend for the loss function to go down and that's exactly what
we have in this situation so long story short that's my recommendation on how to do this this
validation how to do this analysis on all the metrics we have over here for now focus on these
two and make sure they are going down and then in order to continue with this process with
this validation is that we are going to take a look at what happens with our predictions how
is this model performing with some data with some predictions and for this we are going to take
a look what happens with all of these images right you can see that these are some batches
and these are some some of our labels some of our annotations for all of these images and then
these are some of the predictions for these images right so we are going to take a look what happens
here and for example I'm going to show you these results, the first image, and you can see
that looking at this image which again these are not our predictions but this is our data these are
our annotations these are our labels you can see that there are many many many missing annotations
for example in this image we only have one mask we only have the mask for one or four ducks
we have one two three four five dogs but only one of them is annotated we have a similar behavior
here only one of the ducks is annotated here is something similar only one of them is annotated
and the same happens for absolutely every single one of these images so there are a lot of missing
annotations in this data we are currently looking at and if I look at the predictions now these are
the same images but these are our predictions we can see that nevertheless we had a lot of missing
annotations the predictions don't really look that bad right for example in this case we are
detecting One Two Three of the five Ducks we so we have an even better prediction that we have
over here I would say it's not a perfect detection but I would say it's very good right it's like
it's not 100% accurate but it's like very good and I would say it's definitely better than the
data we used to train this model so that's what happens with the first image and if I take a look
at the other images I can see a similar Behavior right this is the data
we used for training this algorithm and these are the predictions we got for these images and
so on right it seems It's like exactly the same behavior exactly the same situation for
this image as well so my conclusions by looking at these images by looking at these predictions
is that the model is not perfect but I would say performs very well especially considering that
the data we are using to train this model seems to be not perfect seems to have a lot a lot
of missing detections have a lot of missing elements right a lot of missing objects so
that's our conclusion that's my conclusion by looking at these results and that's
another reason for which I don't recommend you to go crazy analyzing these plots because when
you are analyzing these plots remember the only thing you're doing is that you are comparing your
data the data you are using in order to train this model with your predictions right the only thing
you're doing, you're comparing your data with your predictions with the predictions you had with
the model right so as the only thing you are doing is a comparison between these two things then
if you have many missing annotations or many missing objects or if you have many different errors
in your data in the data you're using to train the algorithm then this comparison it's a little
meaningless right it doesn't really make a lot of sense because if you're just comparing one thing
against the other but the thing you are comparing with has a lot of Errors it has a lot of
missing objects and so on maybe the comparison doesn't make any a lot of sense whatsoever right
that's why I also recommend you to not go crazy when you are analyzing these plots because they
are going to give you a lot of information but you are going to have even more information
when you are analyzing all of these results and this is a very very very good example of what
happens in real life when you are training a model in a real project because remember that building
an entire dataset, a dataset which is 100% clean and absolutely 100% perfect is very very very
expensive so this is a very good example of what happens in real life usually the data you're using
to train the model, to train the algorithm has a few errors and sometimes there are many many many
errors so this is a very good example of how this validation process looks like with data which
is very similar to the data we have in real life which in most cases is not perfect my conclusion
from this evaluation for this validation could be improving the data taking a look what's going
on with the data and the next step would be to improve the data and by looking at these results
one of the ways in which I could improve this data is by using the predictions I'm getting instead
of the annotations I I used to train this model you see what I mean if the annotations if the
predictions we are getting are even better that the annotations maybe our next step will be to use
these predictions in order to train a new model do you see what I mean so the by analyzing all of
these results you are going to make decisions on how to move forward on how to continue and
this is a very good example of how this process look like in a real project this is pretty much
how it works or how it looks like when you are working in a project when you are working either
in a company or if you're a freelancer and you're delivering a project for a client this is pretty
much what happens right there are errors things happen and you need to make a decision given all
the information you get with all this analysis so that's going to be all in order to show you
this very simple and very straightforward way in order to validate this training process in order
to make some conclusions regarding what's going on right and regarding to make some decisions you
know how to how to move forward this project or this training process and now let me show you
something else which is within this directory the weights folder this is where your weights will be
located right because if you are training a model is because you want to have a model in order to
make predictions in order to make inferences and this is where your models will be located this
is where your model will be saved and this is something I have already mentioned in one of my
previous videos regarding yolo V8 remember you will have two models one of them is called last.pt
another one is best.pt and the way it works is that remember that when you are training a model at
the end of absolutely every single Epoch you are updating your weights you are updating your model
so at the end of absolutely every single epoch you already have a model which is available
which you can use if you want to so last.pt means that you are getting the last model
the model you got at the end of your training process so in this case I am training a network
for 10 epochs if I remember correctly so this is the model we got at the end of the tenth Epoch and
then base.pt means that you are getting the best model the best model you train during the entire
training process if I show you the metrics again let's see the metrics over here you can see that
we have many metrics which are related to the loss function and then other metrics related to the
accuracy on how this model is performing and the way yolov8 decides what's the best model
in this case which is a semantic segmentation type of problem may be related to the loss function
maybe it's taking the model at which you got the minimum loss or it may be related to some of these
plots some of these performances which are related to the accuracy to the performance maybe it's getting
the model for which you got the maximum Precision for example or the maximum recall or something
like that I'm not 100% sure I should look at the documentation but the way it usually goes is
that last.pt is the last Model you trained so it's at the end of your training process and then
best.pt is your best model and this best model is decided under some criteria so that's basically
how it works and what I usually do is taking last.pt because I consider that last.pt is
taking, is considering, way more information much more information because we are taking much more
data we are taking much more everything right in all the training process we are doing many many
different things so if you take the last model you are summarizing way more information that's
the way I see it so usually I take the last model usually I take last.pt and that's pretty
much all in order to show you this validation how validating this model looks like and now
let's move to the prediction let's see how we can use this model in order to make inferences in
order to make predictions so let's see how we can do that so let's go to pycharm let's go to
the pycharm project of today's tutorial and this is a python script I created in order to do these
predictions this python file is called predict.py and this is what we're going to do I'm going to
start importing from ultralytics import YOLO and then I am going to define the model path the model
we are going to use which in our case let's use last.pt from these results, from this directory,
so I am going to specify something like this... last.pt and then let's define an image path let's
define the image we are going to use in order to get our inferences so the image will be located...
this will be from the... from the validation set I'm just going to choose a random image something
like this one so I am going to copy paste I am just going to paste it here so this
is the image we're going to use in order to test this model in order to make our predictions
and now I'm going to import CV2 as well because I'm going to open I'm going to read this
image and then I am going to get this image shape so this will be something like this this will be
image and then this is image.shape okay and now the only thing we need to do is to get our model
by doing something like YOLO and then model path okay and then we are going to get the results by
calling model of our image right and this is it this is all we need to do in order to get our
results in order to get our inferences but now let's do something else I am going to iterate for
result in results and now let's take a look at this mask let's take a look at this prediction
so I'm going to iterate like this for j, mask in result dot masks dot data and then I am
going to say something like mask Dot numpy times 255 and this is our mask
and then I am going to resize it to the size of the image so I'm going to input the
mask and then this will be if I am not mistaking the order is this one W and then H so this is just
the way it works this this is how we need to do it in order to get the prediction and then in order
to resize this prediction back to the size of the original image so this is how it goes and now the
only thing we need to do is to call CV2 imwrite and I'm going to save it I'm going to save it here
and the name will be something like that let's call it output this is only a test so we don't
really need to go crazy with the name let's call it output.png and this will be our mask and that's
pretty much all that's pretty much all let's see what happens I'm going to press play Let's see if
everything is okay or if we have some error okay so I did get an error and yeah this is because
we need to enumerate I forgot the enumerate we are not using J actually so I could just iterate
in mask but let's do it like this okay everything ran smoothly everything is okay we didn't get any
error and now if I go back to this folder to this directory I can see this is the output this is the
output we got and now in order to make absolutely and 100% sure everything is okay and this is a good
mask this is a good prediction I'm going to make an overlay I'm very excited I don't know if you
can tell but I'm very excited I'm just going to take this image over here and then I'm going back
here and I'm going to take the original image I'm going to do an overlay so this will be raise to top
I'm going to align these two images together and now let's make a transparency and let's see what
happens and you can see that we may not get like a 100% perfect mask but it's pretty well it's like
a very very good mask especially considering the errors we detected in our data so this is amazing
this is a very good detection this is a very good result so this is going to be
all for this tutorial and this is exactly how you can train a semantic segmentation model using
yolo V8 and this is the entire process from how to annotate the data how to train the model and how
to validate this model and then how to make some predictions so this is going to be all for today so this is exactly what you will be able to do
with today's tutorial in this video we're going to work with pose detection using yolo V8 and
I'm going to show you the entire process from how to annotate your custom data for free using a
computer vision annotation tool how to prepare your data and your file system for training
this pose detector how to do the training in your local computer and also from a Google collab and
how to do a super comprehensive evaluation of the model you trained this is a much more complex
problem in my previous tutorials I showed you how to train an image classifier using yolo V8 an
object detector and an image segmentation model and I would say that today's model this keypoint
detector is much more complex than everything we did before this is going to be an amazing tutorial
my name is Felipe welcome to my channel and now let's get started and now let me show you the data
we are going to use on this tutorial we're going to use the AWA pose dataset and let me show you
exactly how this data looks like so you can see that these are pictures of many different animals
currently we are looking at antelopes these are pictures of many different antelopes and if I
scroll down in this directory you are going to see I also have other animals for example here this is
a bobcat which is some sort of feline some sort of cat you can see that these are many different
pictures of this animal and if I scroll down a little more you are going to see I also have
buffaloes so we also have pictures of buffaloes and if I continue scrolling down you are going to
see other pictures of other animals for example here I have a Chihuahua and you get the idea
right we have pictures of many many many different animals and all these animals are quadrupeds
because this is a quadrupeds keypoint detection dataset now let me show you the key points
we are going to be detecting for each one of these animals and you can see that these are many
many different key points we have 39 key points in total which is a lot and we are detecting many
different parts for example the nose the eyes the jaw the tail the legs and also the ears the horns
or whatever they're called something like antlers it doesn't matter we are detecting many many
different parts in these quadrupeds so this is exactly the data we are going to be using today
I thought it was like a very very cool dataset to use in pose detection and now let's continue so I'm
going to show you how to do the entire process of training a pose detector using yolo V8 on your
custom data and in my case the data I am going to use in this tutorial is already annotated right
so I already have the annotations for this data but if you are training this pose detector on
your custom data then most likely you will need to annotate the data yourself so I'm going to
show you how you can do that I'm going to show you how to do the entire annotation process
using CVAT which is a very very popular and a very awesome annotation tool for computer vision
and let me show you how to do it so I'm going to cvat.ai this is CVAT website and I'm going
to click here where it says start using cvat I'm going to show you how to create a project how
to create a task and how to do all the annotation now I'm going to project and I'm going to
click the plus button I'm going to click here and create new project and this is going
to be key Point detection this is going to be quadruped key Point detection which is exactly
what we are going to be doing then add label and I'm going to add quadruped continue
and that's pretty much all submit and open this is where you are going to add absolutely
all the labels you have in your custom data in my case I only have one label which is quadruped
now let's continue now I'm going to create a task create new task the name of this task will be
something like quadruped key Point detection task zero zero one and I am going to add an
image I'm going to I'm going to show you how to annotate this data with only one image
so I'm only going to select the first one and then I'm going to click here in submit and
continue we have to wait a couple of minutes until the data is uploaded into the server and
once everything is completed we need to go to tasks this is our project and this is a task we
have just created and I'm going to click in open so this is pretty much all now I'm going to
click here this is going to open the task and now we need to start our annotation process so
you need to click here where it says draw new points and you need to select the number of
points you are going to annotate in my case I'm going to annotate 39 points but you need to select
as many points as you are going to annotate so now I'm going to click here in shape and we need to
start our annotation process and something that's very very very important is that once you are
annotating your data you need to follow a given order right once you are annotating all of your
key points you need to follow a given order with your key points if I show you this image again
you can see that we have many many different key points we have the location of all the key
points but we don't really have any information regarding the order of these key points right
this is very very important because you cannot follow any random order you need to follow a
given order you need to follow always the same order when you are annotating your data so this is
for example the order I am going to follow in this tutorial you can see that the first key point
I'm going to annotate is nose then upper jaw then lower jaw mouth end right and so on right
you need to specify a given order for your data now I'm going to start this annotation process
so the first point is nose which I'm going to set over here then the next one is upper jaw which
is going to be something like this lower jaw here mouth end right and this is the right from
the perspective of this animal right so this is going to be here now mouth and left and I don't
really see the mouth end left but I'm going to say it's around here and I'm going to share a
few comments later on this tutorial regarding the visibility of our key points right but for now
let's just continue now the next one is right eye then right earbase which is here and then
right ear and which is over here and I'm just going to continue with all of this list and I'm
going to resume this video when I'm completed and these are the last two body middle right
which is around here and body middle left which is around here I don't see it but is around here and
you can see that this is all these are my 39 key points and now let me show you how you can export
this data but before, before please remember to click save otherwise... it's always a good practice
to click save and not only you need the key points but you also need to draw a bonding box around
your object this is very very very important and I'm going to tell you why in a few minutes but for
now remember that not only you need to annotate all of your key points but you also need to draw
a bonding box enclosing your object so this is how I did it and I'm going to click save again
this is the only image I'm going to annotate but please remember to follow exactly the same
process for all of your images I'm now going to tasks and I'm going to show you how to export this
data you need to click here and Export task dataset now you need to click here and you can see that
there are many many different options in which you can export your data and one of these options is
coco key points 1.0 and this is very important because this is the exact format we need for our
data but I have tried to export the data into this format and it's not working for some reason it's
not working so I'm going to show you how to do it in cvat for images 1.1 so click here then okay
and then you just have to wait until everything is downloaded once everything is fully exported
you are going to see a file a zip file and within this file there will be an another file called
annotations.xml now let me open this file so I can show you how it looks like you are going to
see something like this and at the bottom of this file you are going to see all of your annotations
and all the images you have annotated and its annotations right so this is exactly the
data you are going to generate using cvat now let me show you something else I have created
a python project for today's tutorial and let me show you a script I created in this python project
and this script will be super super super useful because now that you have your annotations now
that you have your data you need to convert your annotations into the exact format you need
in order to use this pose detector using YOLO V8 so let me show you basically you need to specify
two variables one of them is the location of your annotations.xml file and you also need to specify
the location of the directory where you want all your data to be saved right this script is going
to parse through this XML file is going to parse through this file and it's going to extract all
of your annotations and it's going to save all of your annotations into the exact format you need
in order to use yolo V8 so remember to specify these two paths these two variables one of them is
the location of your XML file and then where you want all of your newly created annotations to be
saved right where you want this output directory so once you have set these two variables the only
thing you need to do is to run this script and everything will run super super smoothly and
remember this script will be available in The github repository of today's tutorial so you
can just go ahead and use it in order to convert all of your data into the format you need to use
yolo V8 and now let's continue now I'm going to show you how you need to format how you need to
structure all of your data and your file system so it complies with yolov8 so you can see that
this is a directory which is called data and this is the root directory where my data is located you
need a directory which will be the root directory where your data will be saved where your data
will be located within this root directory you can see I have two folders one of them is
called images and the other one is called labels it's very important that you name these two
folders exactly like this one of them should be called images and the other one should be
called labels that's very important now if I open one of these folders you can see I have two
other folders one of them is called train another one is called val and it's very important that
you name these directories exactly like this one of them should be called train and the other
one should be called val So within train is where we will have all of our training data all of
our training images right you can see that these are all of our images which are all the images
we are going to use as training data and within val it's exactly the same these are all the images
we are going to use as validation data as our validation set right so within images we have two
directories one of them is called train the other one is called val and within each one of this
directories each one of these directories is where we have all of our data all the data we are
going to use in order to train this model all the images we are going to use in order to train
this model but we also have additional data which are the labels now let me show you how this other
folder looks like you can see that within labels we also have two directories which are also
called train and val and it's very important that you name these two directories exactly like
this one of them should be called train and the other one should be called val and if I open
the train directory you can see that we have many many many txt files and these are basically
all of our labels for the training data for all of our training images if I go back to images train
you can see that for absolutely every single one of these images we have an annotation file right
for absolutely every single one of these images we are going to have a txt file in this folder
and now let me show you for the other directory for val it's exactly exactly the same but for the
validation data for the validation images right so if I go back again you can see that we have
the root directory then images labels within images we have two directories train and val and
within each one of these directories is where we have all of our images and if we go to labels we
have also two directories train and val and within each one of these directories is where we have all
of our labels so this is exactly how you need to structure your file system and now let me show you
one of these annotations files one of these labels files from the inside let me show you how they
look like so this is a random annotations file this is a random txt file and this is exactly how you
need to put all the data inside these files the annotations are specified in the Coco Key Point
format which is a very popular format for pose detection now let me show you something I'm going
to do something which I'm obviously not going to save the changes but this is going to be much
better in order to show you how these annotations format works right how it looks like so basically
you can see the first number is a zero and this is our class ID in my case I'm only I only have
one class which is quadrupled so in my case this number will always be zero but if you are making
this project and you have many many different classes please remember that this number should be
the class ID so if you have different classes you will have different numbers here now the next four
numbers are the bounding box of your object right remember in cvat when we were annotating this
data I showed you that not only we need to annotate the key points but we also need to annotate the
bounding box right and we annotated the bounding box so these four elements the four elements
that come after the class ID are the bounding box right and this bounding box is specified in
the yolo format which is the X and Y position of the center of the bonding box and then the
width and then the height of your bounding box this is very important so this number these two
numbers are the X Y position of the center of your bounding box and then the width and the height
and then all of the other numbers let me show you, you can see that we have these two numbers
which are a float and then we have the number 2 and then we have exactly the same two numbers and
another 2 then two numbers and another 2 then we have three zeros right this looks like very
very strange so now let's go back to my browser because I want to show you this website which is
cocodataset.org and this is where we are going to see exactly how this format works so if I go back
to key Point detection you can see that this is our explanation about how this format works and
if I read something which is here you can see that absolutely every single key point will be
specified as X and Y and a visibility flag V so this means that for absolutely every single key
point we are going to have three values we are going to have the X and Y position of that given
key point and we are also going to have another value which is V which is the visibility right
remember we were going to talk about visibility later on this tutorial this is later on
this tutorial so you can see that V has three possible values V could be zero and this means
that the key point is not labeled and in this case X and Y is going to be 0 too or V equal 1 and
this means the key point is labeled but it's not visible or V could be 2 and this means the key
point is labeled and it's also visible and if we go back to the to this file to the annotations you
can see that if we start over here we have two numbers and then we have a 2 which means this key
point is annotated, is labeled, and is also visible now if we continue you can see that we have two
numbers and then we have another two which means this other key point is also visible now if we
continue you can see exactly the same two numbers and then a two and then if we continue you can see
that this... we have three zeros and we are in this situation right V equals zero so we also have x
and y equal to zero and this means the key point is not labeled for this image right so long story
short after the bounding box all the other numbers will be the key points and you will have two
values for the X and Y position and then the third value will be the visibility of that given key
point now this is one of the possible formats in which you could format your data and this is going
to work just fine but YOLO V8 also supports a key Point annotation with only two values which means
that if you don't have the visibility information for all of your key points then it doesn't matter
because yolo V8 also supports you input your key points with only the X and Y coordinates so long
story short we have the first number which is the class ID then we have four numbers which are the
bounding box and then all of the other numbers are the key points and you can specify your key
points with three coordinates for every key point which means we have the X and Y and also the
visibility for that key point or you can specify all of your key points with only two coordinates
which means its the X are the Y coordinate of that given key point so this is the way you need to
label your data is the way you need to structure all of your annotations and please remember to
do it this way otherwise it's not going to work so now I'm just going to press Ctrl z because
obviously I'm not going to save all of those changes and that's pretty much all about how
to format your data how to format your file system and how to put your data into the exact format
you need in order to train this pose detector using yolov8 and now let's go back to pycharm
let's go back to the pycharm project I created for today's tutorial and the first thing you need to
do if you want to train this pose detector using yolo V8 is to install the Project's requirements
which is basically ultralytics so please remember to install this package before starting with this
training because otherwise you will not be able to train a pose detector using yolo v8 so once
you have installed ultralytics let's go back here to this file I created which is train.py I'm going
to show you exactly what you need to code in this file in order to do your training and in order
to do so let's go back here which is ultralytics website and let's go to the pose page and let's
scroll down until this section over here and the only thing I'm going to do is I'm going to copy
and paste this line and then I'm going to copy and paste this other line right so this is basically
all we need to do in order to train this model and obviously I need to import from ultralytics import
YOLO and that's pretty much all so this sentence over here we can just leave it as it is we can
just leave it in this default value but this one I am going to make a couple of changes I'm going
to change the number of epochs I'm going to train for only one Epoch for now and I'm also going to
change the location of the configuration file I'm going to use this file which is
config.yaml and now I'm going to show you how this config.yaml looks like so you can see that this is
the configuration file I am going to use in this tutorial we have three sections one of them for
data then key points and then classes and let's go to the data section first this is where you're
going to specify all the locations to your data to your images and your labels so basically you
need to specify the root directory the directory containing your data which in my case is this one
remember the root directory this is the directory which contains the images and the labels folders
and then you need to specify what's the location of the training images and the validation images,
if you have made everything as I show you in this tutorial as I showed you a few minutes ago then
you can just leave these two lines in these values right you can just leave everything as it is and
everything will work just fine the only thing you need to edit is the location of your root
directory now let's go to this section over here which is the key points and we have two keywords
which are key Point shape and flip index and these two keywords are completely and absolutely new for
us this is something we haven't seen before in any of my previous tutorials about yolo V8 and you
can see that in the case of key Point shape in my case it says 39 3. that's because I have 39 key
points and I'm using the X Y and V format right I'm using three values for every single key point
so in my case this that's why I have a 3 over here so this is how many key points you have in your
data and this is what format are you using if you use you're using the X Y format in that case you
will need to specify a 2 or if you're using the X Y and V format and in that case you will need
to set a 3 as I am doing over here so that's for key Point shape and now let me explain what flip
index is and in order to further explain what this keyword means I made a drawing over here where I'm
going to show you exactly what it means so you can see that this is a random image in my data set and
actually this is the same image I used in order to show you how The annotation process looks like
for this data and you can see this is a quadruped with all of its key points drawn on top right
now let me show you what happens if I flip this image horizontally right this is what I get you
can see that this is exactly the same image but the only thing I did was to flip it horizontally
if I flip an image horizontally now everything that used to be one of the sides now is the other
side right everything that used to be the right side now is the left side and the other way around
everything that used to be the left side now it's the right side that's only what happens when you
are flipping an image horizontally but remember we had many many different key points and many of
these key points were related to one of the sides for example we had a key point for the right eye we
also had key points for the right ear we had key points for the right legs and the same situation
for the left eye the left ear and the left legs right many of our key points are related to one of
the sides if we flip the image horizontally then we should be doing something with all of these key
points which are related to one of the sides right when we are training a model using yoloV8
when we are training this type of model one of the steps one of the stages in this process
in the training process is to do something which is called Data augmentation and this data
augmentation means that we are taking the data and we are doing different Transformations with
this data one of the Transformations we are doing is related to flipping the image right so we are
going to be flipping some of our images at random and every time we are going to be doing an
horizontal flip we are going to have a situation like this so now let's go back to this
list which is the list of all the different key points we have in this data set right remember I
already showed you this list when I was annotating this image and remember we start with the nose
then the upper jaw then the lower jaw and so on so you can see that some of these key points are
related for example in this case to to the right side this is related to the left side we have
many key points over here which are related to the right side then we have many key points
which are related to the left side then we have other key points which are not related to any of
the sides for example neck base neck end throat back these are generic key points and they are
not related to any of the sides and we will need to do something with all the key points which are
related to one of the sides for example these two then all of these over here and so on right you
get the idea that's exactly what we need to do and that's exactly what this flip index keyword
does right that's exactly the idea the intuition behind this flip index so let's go through this
list you can see that the first element is nose and if we think about a nose it's right in the
middle and nothing is going to happen when we flip the image right the nose will continue
being the nose will remain as the nose and then the next element is the upper jaw exactly
the same nothing will happen with the upper jaw will remain being the upper jaw after we flip the
image horizontally the same will happen with the lower jaw but when we get to this element this
is the mouth end right and we will have an issue here here because the mouth end right when we flip
the image horizontally will be the mouth end left and the next element which is mouth end left when
we flip the image horizontally now it will be the mouth end right you get the idea these two values
these two key points will be flipped when we flip our image right and now let's take a look at this
list we have over here which is the value for flip index and you can see that the first element is
zero then one then two then four three right so we are flipping these two values right instead of
having a 3 4 which will be like the natural order we have a 4 3 we are flipping these two values
and these are exactly the indexes of these two key points in the key Point order so long story short
the only thing we will need to do in order to fix this issue we will have when we are flipping our
images horizontally the only thing we will need to do is going through all of our key points and all of
the key points which are related to the right side we need to flip them in order to make them the
left side right we only need to flip the right and the left side that's the only thing we need
to do and that's what we need to specify here in this list this is how the flipping will be done
so please be super super careful with this list and this remember means how your indexes will be
flipped when the image is flipped horizontally now let's move to this section and these are all
of your names of all of your objects in my case I only have one object which is quadrupled so in
my case this is very simple but please remember to specify all the names and all the class IDs
for absolutely all of your names in my case I only have one in class ID which is zero and means
quadrupled so that's pretty much all for the config.yaml and now we let's go back to train.py
and let's continue so once you have specified this configuration file and you have specified
everything we have over here the only thing you need to do is to execute this script and that's
all that's how easy it is to train this model but I'm going to stop this training because otherwise
it's going to take a lot of time if I train this model locally it's going to
take a lot of time I have been doing some tests already and yeah it's going to take forever if I
do it locally but this is exactly the process you should follow if you want to train this model
in your local environment but I mentioned that I'm also going to show you how to train it in a
google collab so now let's go to my browser and let's see exactly how we can do this training from
a Google collab the first thing you will need to do is going to your Google Drive and you will need
to upload absolutely all of your data obviously because otherwise you will not be able to train
this model from your Google Drive and also you will need to upload your config.yaml file and
everything will be just exactly the same as the file I showed you in my local computer but you will
need to edit this field which is the path right you can see this path over here you will need to
edit this with the path to your data in Google Drive this is very important and otherwise nothing is
going to work so please remember you need to edit this path and I'm going to show you exactly how
to know what's the location of your data in your Google Drive now let's go back to this Google
colab this is the Google colab I created for this tutorial for this training and obviously you
will be able to find this notebook in the GitHub repository of today's tutorial so for now just
follow along you can see that we have only a few cells and the only thing I'm going to do
is to execute these cells one by one so I'm going to start with the first one which is connecting
my Google colab environment with Google Drive and now the only thing I have to do is to select
my account then I scroll all the way down and I click allow and that's basically all we need to do
in order to connect our Google collab with Google Drive so this way Google collab will be able to
access the data you have in your Google Drive we have to wait a few seconds and that's pretty
much all you can see that everything has been mounted here in content gdrive and now the
next step is to install Ultralytics right because remember we are going to use ultralytics which is
the python package we need to use in order to use yolo V8 and the only thing we need to do is to
execute this cell and everything is now completed now in order to continue with the next two cells
you need to know where your data is located in your Google Drive the only thing I'm going to do
is to execute this cell and this is going to list absolutely all the files in
my Google Drive and in my root directory right you can see these are many many many files and
from here the only thing I would need to do is to find where my data is located in my case it's
located in this folder if I do an ls again this is the content of this folder so the only
thing I need to do is to locate this directory and then that's it right this is
the content of this directory which is where my data is located so the only
thing I need to do is something like this and that's the content of my data
directory which contains the two folders images and labels now that you know where your data is
located in your Google Drive now you can just copy and paste this path in your config
file right now that you know exactly where your data is located you can just come here and
you can edit this field and you can just put wherever your data is located and then you need
to specify the location of your config.yaml file and once you set this location over here you are
all set and the only thing you need to do is to click enter right you need to execute this
cell and that's going to be pretty much all now everything is being executed this is going
to take some time the first thing is going to do is download all the weights into this Google collab
environment and that's pretty much all it's going to get all the data and then it's going to do
the training okay now the training process has been completed and you can see that the results
have been saved here in run pose train4. so I'm going to show you how to execute this cell so
we copy the entire content of the runs directory into your Google Drive Right remember that the
idea of this training process is to download the results to download the weights to download the entire
results which have been saved here and the way to do it or one of the ways to do it I would say the
easiest way to do it is to copy everything into Google Drive and then just download everything
from Google Drive so this is the simplest way to do it and please remember to edit this path
to the path where you want everything to be copied when you execute this cell so let me show you if
I go back to my Google Drive you can see this is the runs directory which was just generated
and within this directory is where we have this train4 which is the result of the training
process we have just executed so everything seems to be okay and what we will need to do now is to
download this directory into our local computer now remember this was a very very dummy training
this is a training we did for only one Epoch obviously you will need to do like a deeper
training if you really want to train your pose detector on your data one Epoch it's it's very
unlikely to be sufficient you will need to do like a deeper training in my case I have already
trained the model with my data and I did it for 100 epochs right I did it before starting this
tutorial so everything is already trained and we can just analyze the results so now let's move
to my local computer so I can show you exactly how to validate the model I trained using yolo V8
and you can see that these are many many different plots many different functions we are plotting a
lot of information but we are going to focus in the loss function and specifically we are going to
focus in this loss which is the loss function related to the pose so we are going to focus on the pose
loss related to the training set and the pose loss in the validation set and if we look at the training
set you can see that the loss is going down but not only is going down but I would say that's going
to continue going down for even more epochs right you can see that the trend is that it's going
down and it's going to keep going down for more iterations for more epochs we haven't reached
a plateau and I would say that we are very far away of any Plateau right so this is a very good
sign and it means the training process is going super well and it means that the model has
extra capacity it has more capacity so we could continue training this model and it will continue
learning more about this data that's what I take by looking at this pose loss in the training set
and if we look at exactly the same function but in the validation set we can see that it's going down
so that's a good thing but I have the impression that it's starting to be something like a plateau
right it's not very clear because it's happening right in the end of this
training process but you can see that it somehow seems like this is going to be a plateau from now
on right at the very least we can see that it's going down that's absolutely and 100% clear and
then it's unclear what will happen from now one or what would have happened if I would have trained
this model for more epochs but it seems we may have reached a plateau and that's something we need to
keep in mind in this validation process but now let's take a look at exactly how it's performing
with some images so the way I'm going to do it is like this I am going to open this image which is
the it's one of the batches in the validation set and these are our labels right these are not
our predictions but these are the labels the annotations now I'm going to keep this open and
I'm going to open exactly the same batch but our predictions right this is going to be a very good
way to analyze the results because now we have a lot of images and we need to make more conclusions
we need to take a look at more samples we need to... this is a much more complex problem in
my previous tutorials I showed you how to train an image classifier using yolo V8 an object
detector and an image segmentation model and I would say that today's model this keypoint detector is
much more complex than everything we did before so this validation process will be more complex as
well now if you look at all of these images I'm going to focus on only one of them I'm going to
focus on this dog on this Dalmatian and I'm going to show you exactly what's going on here let's
focus on this animal first and you can see that basically these are all of our key points
basically this is our ground truth these are all of our annotations and these are all our predictions
and you can see that it looks pretty well it looks very very well I would say that if we look at all
of these key points which are around the face I would say they are perfect I would say they are
very very good and then if we look at these keypoints over here you can see that it's very good
as well if we look at these three key points it's also very good these two key points over here as
well it's it's very good too and then if we look at the legs I can see something is going on here
because I don't really see these two key points so we are not detecting the legs and if I look at
the legs entirely I would say something is going on because I don't really see... I think we have an
issue in the legs and also you can see that this key point which is at the end of the tail we are
not detecting this key Point either so we have some issues we have an issue in the tail and we
have an issue around the legs but everything else I would say that is pretty good I don't know what
you think but I think it's pretty pretty good so this is one of the examples and now let me show
you another one which is in another batch again these are the annotations and these are the
predictions so let me show you what happens in this rhinoceros let me show you what happens
these are our annotations and these are our predictions and you can see that we have a similar
situation around the face I would say everything it's just okay we have like a very good detection
then over here we have very good detection too these three points are very well detected then
over here everything is okay and then we also have an issue around the legs right we are not
detecting all the key points in the legs properly the same happens over here with this other leg and
the same happens in the tail, this keypoint which is the end tail and then everything
else seems to be working super properly we are detecting all the key points but we have an issue
around the legs and around the tail and now let me show you other examples for example this one over
here you can see that in this case we have the animal in a different posture so it's a little
more challenging for the model and you can see that in this case we are detecting the face
very very well actually we're not detecting this eye but other than that all of the other key
points around the face are very well detected and you can see these three key points over here
everything is okay this one is okay this one too and then you can see that we have other key points
which are also very well detected but we have an issue again around the legs right and that's
pretty much what I noticed by looking at many of these examples right in many of these situations
we have many different situations because there are different animals they are in different
postures they are in different everything so we are going to notice different situations but
after inspecting a few of these images I had the impression that the model is performing very very
very well but we may have an issue around the legs and around the tail that was my impression
by analyzing many of these pictures so by combining this information all the information
we got by analyzing this images, all the keypoints, how they were detected and so on and
also combining everything with the loss function with this plot regarding the loss function in
the training set and in the validation set my conclusions from here will be to make a deeper
training to train this model for even more epochs and I'm curious to see what will happen in that
situation because if I look at the training loss I really like what I see I think this model have way
more capacity I think we could train for I don't know 50 more epochs 100 more epochs and I think we
will be in a very good situation the training loss will continue to go down and it will continue
to go down in this way right it seems we are very far away from the plateau but if I look at the
validation loss I'm not completely and absolutely sure what happens from now on so what
I would do now in order to improve these results or in order to try to improve these results will be
to continue training for more epochs and I would see what happens next I would see what happens
with this loss and then I would see what happens by analyzing these images again right that would
be my next step by analyzing all this information so this is a very good example of how to analyze
your model how to analyze your data and your plots and so on in a more complex example as this one
because remember that now we are trying to detect... now we are trying to learn something that's way
more complex as we did in our previous tutorials now we are not trying to learn like a bounding
box or a mask but we are trying to learn the entire structure of a quadruped so that's... trust
me it's way more complex than everything we have made so far so this is a very good example of how
to validate a model when the problem is a little more complex take a look at the loss function
take a look what's going on take a look in the training set and the validation set and also
take a look at some examples and then just make some conclusions in my case what I would do is to
train for more epochs and also please remember that we are always using the default values of
this training right we are training this model using all the default values the only values we are
specifying are the image size and the number of epochs and that's it and if I show you the entire
configuration file we are using is this one and you can see that we have many many many many many
many hyperparameters so another next step in case a deeper training is not enough another Next Step
would be to play around with the different hyper parameters and to find another combination of
the parameters which would be better for our use case that's very important because if you are
approaching a more complex problem like this one like the one I am doing right now I would say that
it's not very realistic to expect everything goes super super well from the first attempt by using
all the default values right I would say if the problem you are trying to solve it's
much more complex then most likely you would need to play around with the different hyper parameters
and you would need to find a combination of hyper parameters that suits well with your problem
with your project so that's what I can say about this validation process and now let me show
you something else which is where the weights are located within this folder you will see another
folder which is called weights and within weights you are going to see two files which are
best.pt and last.pt these are the modes you generated with this training process and this
is something that I have already mentioned in my previous tutorials but I'm going to say it
again last.pt is the model you trained at the end of your training process and best.pt means
that this is the best model you trained in the entire training process so you have these two
models and you can just choose the one you like the most and what I usually do is taking last.pt
I consider that this is a much more robust model so this is the one I usually consider when
I'm making my predictions and that's pretty much all I can say about this validation, about validating
this model and now it's time to make our predictions so let's get back to pycharm let me
show you this file which is called inference.py this is the file we are going to use in order to
make predictions with the model we just trained so let me show you how to do it I'm going to
start importing from ultralytics import YOLO ultralytics import YOLO and then
I'm going to Define my model path I am going to specify the location of the model
we just trained right which is this one in my case this is the location of my model I'm going
to select last.pt and I'm also going to set the path to an image, to an image I'm going to use
in order to show you how to make predictions my image will be located in samples wolf.jpg let me
show you super quickly the image I'm going to use in order to show you how to make predictions with
yolo V8 let's go to samples and this is exactly the image I am going to use you can see that
this is the image of a wolf which is obviously a quadruped so it's going to be an amazing image in
order to show you how to use this model now let's get back to pycharm and let's do something
like this I'm going to define my model like YOLO model path and then we're going to
say something like results equal to model image path and I'm going to select the first
element because as we are predicting only one image the first element will be just fine and then
it's just about iterating for result in results and this will be something like for keypoint
in result key points dot to list there and for now the only thing
I'm going to do is to print keypoints so we make sure everything
is okay and let's see what happens okay it seems I have an error and I think I know
what's the error to list goes without the underscore so let's see now okay now everything
seems to be okay and what I'm going to do now is I'm going to import cv2 because I am going to
read the image and I'm going to plot all the keypoints on top of this image right that's going to
be a very good way to show you how to predict all these key points so cv2 imread image path this
is going to be image and now I'm going to call CV2 dot put text maybe it's a very good idea to
put the text of each one of these key points to put the key Point number on top of each one of
these key points so this is going to be image then the key Point number will be... let's do it
like this keep point index key point in enumerate okay and then string key Point
index okay now the location and remember this is how my key points look
like we have three values and the values we care about in this moment are these
two because these are the X Y position of the key point so I'm just going to
do something like int keypoint zero and int keypoint one okay now I have to select
the font which I'm going to set in this one font cursey Simplex then the font size which I'm going
to set in one for now then the color something is not right let me see I think I'm not closing
these brackets I think that's reason one two okay there let's see now everything's okay okay now the color which I'm going to set in
green so this is something like this and then the text width which I'm going to
set in two and this is going to be all for now now let's see how it looks like right I'm
going to call cv2 imshow image and my image then cv2 wait key zero okay and that's pretty
much all what we are doing here is plotting the image and drawing all the key points on top
of this image with the key Point number on each key point right so it's going to be easier to
know exactly what we are detecting in the entire image and as a result everything looks pretty well
but I'm going to do something so I can improve the visualization I am going to make the font size
equal to 0.5 and I'm going to press play again okay now the visualization is a little better and
you can see that everything looks pretty pretty well right we are plotting all the key points on
top of our image and this is exactly how you can make predictions using YOLO V8 so the last thing
I'm going to show you is to open this file the class names and let's take a look at exactly what
we are detecting right so you can see here 0 we have the nose then upper jaw lower jaw mouth end
right mouth end left and so on right you can see that for example 21 we are somewhere around here
which is back middle it makes sense then 37 we are around here body middle right and then 36 belly
bottom so everything looks pretty pretty well and you can see that we are still getting the
issues we notice in the other pictures which is the legs are not very well detected and the
end tail is not very well detected either but everything else seems pretty pretty pretty well so this is going to be all for this tutorial
on yolo V8 my name is Felipe I'm a computer vision engineer and this is exactly the type of videos and
the type of tutorials I make in this channel if you enjoyed this video I invite you to click the
like button and I also invite you to take a look at these two videos over there this is going to be
all for today and see you on my next video [Music]