so this is exactly what you will be able to do
with today's tutorial in this video we're going to work with pose detection using yolo V8 and
I'm going to show you the entire process from how to annotate your custom data for free using a
computer vision annotation tool how to prepare your data and your file system for training
this pose detector how to do the training in your local computer and also from a Google collab and
how to do a super comprehensive evaluation of the model you trained this is a much more complex
problem in my previous tutorials I showed you how to train an image classifier using yolo V8 an
object detector and an image segmentation model and I would say that today's model this keypoint
detector is much more complex than everything we did before this is going to be an amazing tutorial
my name is Felipe welcome to my channel and now let's get started and now let me show you the data
we are going to use on this tutorial we're going to use the AWA pose dataset and let me show you
exactly how this data looks like so you can see that these are pictures of many different animals
currently we are looking at antelopes these are pictures of many different antelopes and if I
scroll down in this directory you are going to see I also have other animals for example here this is
a bobcat which is some sort of feline some sort of cat you can see that these are many different
pictures of this animal and if I scroll down a little more you are going to see I also have
buffaloes so we also have pictures of buffaloes and if I continue scrolling down you are going to
see other pictures of other animals for example here I have a Chihuahua and you get the idea
right we have pictures of many many many different animals and all these animals are quadrupeds
because this is a quadrupeds keypoint detection dataset now let me show you the key points
we are going to be detecting for each one of these animals and you can see that these are many
many different key points we have 39 key points in total which is a lot and we are detecting many
different parts for example the nose the eyes the jaw the tail the legs and also the ears the horns
or whatever they're called something like antlers it doesn't matter we are detecting many many
different parts in these quadrupeds so this is exactly the data we are going to be using today
I thought it was like a very very cool dataset to use in pose detection and now let's continue so I'm
going to show you how to do the entire process of training a pose detector using yolo V8 on your
custom data and in my case the data I am going to use in this tutorial is already annotated right
so I already have the annotations for this data but if you are training this pose detector on
your custom data then most likely you will need to annotate the data yourself so I'm going to
show you how you can do that I'm going to show you how to do the entire annotation process
using CVAT which is a very very popular and a very awesome annotation tool for computer vision
and let me show you how to do it so I'm going to cvat.ai this is CVAT website and I'm going
to click here where it says start using cvat I'm going to show you how to create a project how
to create a task and how to do all the annotation now I'm going to project and I'm going to
click the plus button I'm going to click here and create new project and this is going
to be key Point detection this is going to be quadruped key Point detection which is exactly
what we are going to be doing then add label and I'm going to add quadruped continue
and that's pretty much all submit and open this is where you are going to add absolutely
all the labels you have in your custom data in my case I only have one label which is quadruped
now let's continue now I'm going to create a task create new task the name of this task will be
something like quadruped key Point detection task zero zero one and I am going to add an
image I'm going to I'm going to show you how to annotate this data with only one image
so I'm only going to select the first one and then I'm going to click here in submit and
continue we have to wait a couple of minutes until the data is uploaded into the server and
once everything is completed we need to go to tasks this is our project and this is a task we
have just created and I'm going to click in open so this is pretty much all now I'm going to
click here this is going to open the task and now we need to start our annotation process so
you need to click here where it says draw new points and you need to select the number of
points you are going to annotate in my case I'm going to annotate 39 points but you need to select
as many points as you are going to annotate so now I'm going to click here in shape and we need to
start our annotation process and something that's very very very important is that once you are
annotating your data you need to follow a given order right once you are annotating all of your
key points you need to follow a given order with your key points if I show you this image again
you can see that we have many many different key points we have the location of all the key
points but we don't really have any information regarding the order of these key points right
this is very very important because you cannot follow any random order you need to follow a
given order you need to follow always the same order when you are annotating your data so this is
for example the order I am going to follow in this tutorial you can see that the first key point
I'm going to annotate is nose then upper jaw then lower jaw mouth end right and so on right
you need to specify a given order for your data now I'm going to start this annotation process
so the first point is nose which I'm going to set over here then the next one is upper jaw which
is going to be something like this lower jaw here mouth end right and this is the right from
the perspective of this animal right so this is going to be here now mouth and left and I don't
really see the mouth end left but I'm going to say it's around here and I'm going to share a
few comments later on this tutorial regarding the visibility of our key points right but for now
let's just continue now the next one is right eye then right earbase which is here and then
right ear and which is over here and I'm just going to continue with all of this list and I'm
going to resume this video when I'm completed and these are the last two body middle right
which is around here and body middle left which is around here I don't see it but is around here and
you can see that this is all these are my 39 key points and now let me show you how you can export
this data but before, before please remember to click save otherwise... it's always a good practice
to click save and not only you need the key points but you also need to draw a bonding box around
your object this is very very very important and I'm going to tell you why in a few minutes but for
now remember that not only you need to annotate all of your key points but you also need to draw
a bonding box enclosing your object so this is how I did it and I'm going to click save again
this is the only image I'm going to annotate but please remember to follow exactly the same
process for all of your images I'm now going to tasks and I'm going to show you how to export this
data you need to click here and Export task dataset now you need to click here and you can see that
there are many many different options in which you can export your data and one of these options is
coco key points 1.0 and this is very important because this is the exact format we need for our
data but I have tried to export the data into this format and it's not working for some reason it's
not working so I'm going to show you how to do it in cvat for images 1.1 so click here then okay
and then you just have to wait until everything is downloaded once everything is fully exported
you are going to see a file a zip file and within this file there will be an another file called
annotations.xml now let me open this file so I can show you how it looks like you are going to
see something like this and at the bottom of this file you are going to see all of your annotations
and all the images you have annotated and its annotations right so this is exactly the
data you are going to generate using cvat now let me show you something else I have created
a python project for today's tutorial and let me show you a script I created in this python project
and this script will be super super super useful because now that you have your annotations now
that you have your data you need to convert your annotations into the exact format you need
in order to use this pose detector using YOLO V8 so let me show you basically you need to specify
two variables one of them is the location of your annotations.xml file and you also need to specify
the location of the directory where you want all your data to be saved right this script is going
to parse through this XML file is going to parse through this file and it's going to extract all
of your annotations and it's going to save all of your annotations into the exact format you need
in order to use yolo V8 so remember to specify these two paths these two variables one of them is
the location of your XML file and then where you want all of your newly created annotations to be
saved right where you want this output directory so once you have set these two variables the only
thing you need to do is to run this script and everything will run super super smoothly and
remember this script will be available in The github repository of today's tutorial so you
can just go ahead and use it in order to convert all of your data into the format you need to use
yolo V8 and now let's continue now I'm going to show you how you need to format how you need to
structure all of your data and your file system so it complies with yolov8 so you can see that
this is a directory which is called data and this is the root directory where my data is located you
need a directory which will be the root directory where your data will be saved where your data
will be located within this root directory you can see I have two folders one of them is
called images and the other one is called labels it's very important that you name these two
folders exactly like this one of them should be called images and the other one should be
called labels that's very important now if I open one of these folders you can see I have two
other folders one of them is called train another one is called val and it's very important that
you name these directories exactly like this one of them should be called train and the other
one should be called val So within train is where we will have all of our training data all of
our training images right you can see that these are all of our images which are all the images
we are going to use as training data and within val it's exactly the same these are all the images
we are going to use as validation data as our validation set right so within images we have two
directories one of them is called train the other one is called val and within each one of this
directories each one of these directories is where we have all of our data all the data we are
going to use in order to train this model all the images we are going to use in order to train
this model but we also have additional data which are the labels now let me show you how this other
folder looks like you can see that within labels we also have two directories which are also
called train and val and it's very important that you name these two directories exactly like
this one of them should be called train and the other one should be called val and if I open
the train directory you can see that we have many many many txt files and these are basically
all of our labels for the training data for all of our training images if I go back to images train
you can see that for absolutely every single one of these images we have an annotation file right
for absolutely every single one of these images we are going to have a txt file in this folder
and now let me show you for the other directory for val it's exactly exactly the same but for the
validation data for the validation images right so if I go back again you can see that we have
the root directory then images labels within images we have two directories train and val and
within each one of these directories is where we have all of our images and if we go to labels we
have also two directories train and val and within each one of these directories is where we have all
of our labels so this is exactly how you need to structure your file system and now let me show you
one of these annotations files one of these labels files from the inside let me show you how they
look like so this is a random annotations file this is a random txt file and this is exactly how you
need to put all the data inside these files the annotations are specified in the Coco Key Point
format which is a very popular format for pose detection now let me show you something I'm going
to do something which I'm obviously not going to save the changes but this is going to be much
better in order to show you how these annotations format works right how it looks like so basically
you can see the first number is a zero and this is our class ID in my case I'm only I only have
one class which is quadrupled so in my case this number will always be zero but if you are making
this project and you have many many different classes please remember that this number should be
the class ID so if you have different classes you will have different numbers here now the next four
numbers are the bounding box of your object right remember in cvat when we were annotating this
data I showed you that not only we need to annotate the key points but we also need to annotate the
bounding box right and we annotated the bounding box so these four elements the four elements
that come after the class ID are the bounding box right and this bounding box is specified in
the yolo format which is the X and Y position of the center of the bonding box and then the
width and then the height of your bounding box this is very important so this number these two
numbers are the X Y position of the center of your bounding box and then the width and the height
and then all of the other numbers let me show you, you can see that we have these two numbers
which are a float and then we have the number 2 and then we have exactly the same two numbers and
another 2 then two numbers and another 2 then we have three zeros right this looks like very
very strange so now let's go back to my browser because I want to show you this website which is
cocodataset.org and this is where we are going to see exactly how this format works so if I go back
to key Point detection you can see that this is our explanation about how this format works and
if I read something which is here you can see that absolutely every single key point will be
specified as X and Y and a visibility flag V so this means that for absolutely every single key
point we are going to have three values we are going to have the X and Y position of that given
key point and we are also going to have another value which is V which is the visibility right
remember we were going to talk about visibility later on this tutorial this is later on
this tutorial so you can see that V has three possible values V could be zero and this means
that the key point is not labeled and in this case X and Y is going to be 0 too or V equal 1 and
this means the key point is labeled but it's not visible or V could be 2 and this means the key
point is labeled and it's also visible and if we go back to the to this file to the annotations you
can see that if we start over here we have two numbers and then we have a 2 which means this key
point is annotated, is labeled, and is also visible now if we continue you can see that we have two
numbers and then we have another two which means this other key point is also visible now if we
continue you can see exactly the same two numbers and then a two and then if we continue you can see
that this... we have three zeros and we are in this situation right V equals zero so we also have x
and y equal to zero and this means the key point is not labeled for this image right so long story
short after the bounding box all the other numbers will be the key points and you will have two
values for the X and Y position and then the third value will be the visibility of that given key
point now this is one of the possible formats in which you could format your data and this is going
to work just fine but YOLO V8 also supports a key Point annotation with only two values which means
that if you don't have the visibility information for all of your key points then it doesn't matter
because yolo V8 also supports you input your key points with only the X and Y coordinates so long
story short we have the first number which is the class ID then we have four numbers which are the
bounding box and then all of the other numbers are the key points and you can specify your key
points with three coordinates for every key point which means we have the X and Y and also the
visibility for that key point or you can specify all of your key points with only two coordinates
which means its the X are the Y coordinate of that given key point so this is the way you need to
label your data is the way you need to structure all of your annotations and please remember to
do it this way otherwise it's not going to work so now I'm just going to press Ctrl z because
obviously I'm not going to save all of those changes and that's pretty much all about how
to format your data how to format your file system and how to put your data into the exact format
you need in order to train this pose detector using yolov8 and now let's go back to pycharm
let's go back to the pycharm project I created for today's tutorial and the first thing you need to
do if you want to train this pose detector using yolo V8 is to install the Project's requirements
which is basically ultralytics so please remember to install this package before starting with this
training because otherwise you will not be able to train a pose detector using yolo v8 so once
you have installed ultralytics let's go back here to this file I created which is train.py I'm going
to show you exactly what you need to code in this file in order to do your training and in order
to do so let's go back here which is ultralytics website and let's go to the pose page and let's
scroll down until this section over here and the only thing I'm going to do is I'm going to copy
and paste this line and then I'm going to copy and paste this other line right so this is basically
all we need to do in order to train this model and obviously I need to import from ultralytics import
YOLO and that's pretty much all so this sentence over here we can just leave it as it is we can
just leave it in this default value but this one I am going to make a couple of changes I'm going
to change the number of epochs I'm going to train for only one Epoch for now and I'm also going to
change the location of the configuration file I'm going to use this file which is
config.yaml and now I'm going to show you how this config.yaml looks like so you can see that this is
the configuration file I am going to use in this tutorial we have three sections one of them for
data then key points and then classes and let's go to the data section first this is where you're
going to specify all the locations to your data to your images and your labels so basically you
need to specify the root directory the directory containing your data which in my case is this one
remember the root directory this is the directory which contains the images and the labels folders
and then you need to specify what's the location of the training images and the validation images,
if you have made everything as I show you in this tutorial as I showed you a few minutes ago then
you can just leave these two lines in these values right you can just leave everything as it is and
everything will work just fine the only thing you need to edit is the location of your root
directory now let's go to this section over here which is the key points and we have two keywords
which are key Point shape and flip index and these two keywords are completely and absolutely new for
us this is something we haven't seen before in any of my previous tutorials about yolo V8 and you
can see that in the case of key Point shape in my case it says 39 3. that's because I have 39 key
points and I'm using the X Y and V format right I'm using three values for every single key point
so in my case this that's why I have a 3 over here so this is how many key points you have in your
data and this is what format are you using if you use you're using the X Y format in that case you
will need to specify a 2 or if you're using the X Y and V format and in that case you will need
to set a 3 as I am doing over here so that's for key Point shape and now let me explain what flip
index is and in order to further explain what this keyword means I made a drawing over here where I'm
going to show you exactly what it means so you can see that this is a random image in my data set and
actually this is the same image I used in order to show you how The annotation process looks like
for this data and you can see this is a quadruped with all of its key points drawn on top right
now let me show you what happens if I flip this image horizontally right this is what I get you
can see that this is exactly the same image but the only thing I did was to flip it horizontally
if I flip an image horizontally now everything that used to be one of the sides now is the other
side right everything that used to be the right side now is the left side and the other way around
everything that used to be the left side now it's the right side that's only what happens when you
are flipping an image horizontally but remember we had many many different key points and many of
these key points were related to one of the sides for example we had a key point for the right eye we
also had key points for the right ear we had key points for the right legs and the same situation
for the left eye the left ear and the left legs right many of our key points are related to one of
the sides if we flip the image horizontally then we should be doing something with all of these key
points which are related to one of the sides right when we are training a model using yoloV8
when we are training this type of model one of the steps one of the stages in this process
in the training process is to do something which is called Data augmentation and this data
augmentation means that we are taking the data and we are doing different Transformations with
this data one of the Transformations we are doing is related to flipping the image right so we are
going to be flipping some of our images at random and every time we are going to be doing an
horizontal flip we are going to have a situation like this so now let's go back to this
list which is the list of all the different key points we have in this data set right remember I
already showed you this list when I was annotating this image and remember we start with the nose
then the upper jaw then the lower jaw and so on so you can see that some of these key points are
related for example in this case to to the right side this is related to the left side we have
many key points over here which are related to the right side then we have many key points
which are related to the left side then we have other key points which are not related to any of
the sides for example neck base neck end throat back these are generic key points and they are
not related to any of the sides and we will need to do something with all the key points which are
related to one of the sides for example these two then all of these over here and so on right you
get the idea that's exactly what we need to do and that's exactly what this flip index keyword
does right that's exactly the idea the intuition behind this flip index so let's go through this
list you can see that the first element is nose and if we think about a nose it's right in the
middle and nothing is going to happen when we flip the image right the nose will continue
being the nose will remain as the nose and then the next element is the upper jaw exactly
the same nothing will happen with the upper jaw will remain being the upper jaw after we flip the
image horizontally the same will happen with the lower jaw but when we get to this element this
is the mouth end right and we will have an issue here here because the mouth end right when we flip
the image horizontally will be the mouth end left and the next element which is mouth end left when
we flip the image horizontally now it will be the mouth end right you get the idea these two values
these two key points will be flipped when we flip our image right and now let's take a look at this
list we have over here which is the value for flip index and you can see that the first element is
zero then one then two then four three right so we are flipping these two values right instead of
having a 3 4 which will be like the natural order we have a 4 3 we are flipping these two values
and these are exactly the indexes of these two key points in the key Point order so long story short
the only thing we will need to do in order to fix this issue we will have when we are flipping our
images horizontally the only thing we will need to do is going through all of our key points and all of
the key points which are related to the right side we need to flip them in order to make them the
left side right we only need to flip the right and the left side that's the only thing we need
to do and that's what we need to specify here in this list this is how the flipping will be done
so please be super super careful with this list and this remember means how your indexes will be
flipped when the image is flipped horizontally now let's move to this section and these are all
of your names of all of your objects in my case I only have one object which is quadrupled so in
my case this is very simple but please remember to specify all the names and all the class IDs
for absolutely all of your names in my case I only have one in class ID which is zero and means
quadrupled so that's pretty much all for the config.yaml and now we let's go back to train.py
and let's continue so once you have specified this configuration file and you have specified
everything we have over here the only thing you need to do is to execute this script and that's
all that's how easy it is to train this model but I'm going to stop this training because otherwise
it's going to take a lot of time if I train this model locally it's going to
take a lot of time I have been doing some tests already and yeah it's going to take forever if I
do it locally but this is exactly the process you should follow if you want to train this model
in your local environment but I mentioned that I'm also going to show you how to train it in a
google collab so now let's go to my browser and let's see exactly how we can do this training from
a Google collab the first thing you will need to do is going to your Google Drive and you will need
to upload absolutely all of your data obviously because otherwise you will not be able to train
this model from your Google Drive and also you will need to upload your config.yaml file and
everything will be just exactly the same as the file I showed you in my local computer but you will
need to edit this field which is the path right you can see this path over here you will need to
edit this with the path to your data in Google Drive this is very important and otherwise nothing is
going to work so please remember you need to edit this path and I'm going to show you exactly how
to know what's the location of your data in your Google Drive now let's go back to this Google
colab this is the Google colab I created for this tutorial for this training and obviously you
will be able to find this notebook in the GitHub repository of today's tutorial so for now just
follow along you can see that we have only a few cells and the only thing I'm going to do
is to execute these cells one by one so I'm going to start with the first one which is connecting
my Google colab environment with Google Drive and now the only thing I have to do is to select
my account then I scroll all the way down and I click allow and that's basically all we need to do
in order to connect our Google collab with Google Drive so this way Google collab will be able to
access the data you have in your Google Drive we have to wait a few seconds and that's pretty
much all you can see that everything has been mounted here in content gdrive and now the
next step is to install Ultralytics right because remember we are going to use ultralytics which is
the python package we need to use in order to use yolo V8 and the only thing we need to do is to
execute this cell and everything is now completed now in order to continue with the next two cells
you need to know where your data is located in your Google Drive the only thing I'm going to do
is to execute this cell and this is going to list absolutely all the files in
my Google Drive and in my root directory right you can see these are many many many files and
from here the only thing I would need to do is to find where my data is located in my case it's
located in this folder if I do an ls again this is the content of this folder so the only
thing I need to do is to locate this directory and then that's it right this is
the content of this directory which is where my data is located so the only
thing I need to do is something like this and that's the content of my data
directory which contains the two folders images and labels now that you know where your data is
located in your Google Drive now you can just copy and paste this path in your config
file right now that you know exactly where your data is located you can just come here and
you can edit this field and you can just put wherever your data is located and then you need
to specify the location of your config.yaml file and once you set this location over here you are
all set and the only thing you need to do is to click enter right you need to execute this
cell and that's going to be pretty much all now everything is being executed this is going
to take some time the first thing is going to do is download all the weights into this Google collab
environment and that's pretty much all it's going to get all the data and then it's going to do
the training okay now the training process has been completed and you can see that the results
have been saved here in run pose train4. so I'm going to show you how to execute this cell so
we copy the entire content of the runs directory into your Google Drive Right remember that the
idea of this training process is to download the results to download the weights to download the entire
results which have been saved here and the way to do it or one of the ways to do it I would say the
easiest way to do it is to copy everything into Google Drive and then just download everything
from Google Drive so this is the simplest way to do it and please remember to edit this path
to the path where you want everything to be copied when you execute this cell so let me show you if
I go back to my Google Drive you can see this is the runs directory which was just generated
and within this directory is where we have this train4 which is the result of the training
process we have just executed so everything seems to be okay and what we will need to do now is to
download this directory into our local computer now remember this was a very very dummy training
this is a training we did for only one Epoch obviously you will need to do like a deeper
training if you really want to train your pose detector on your data one Epoch it's it's very
unlikely to be sufficient you will need to do like a deeper training in my case I have already
trained the model with my data and I did it for 100 epochs right I did it before starting this
tutorial so everything is already trained and we can just analyze the results so now let's move
to my local computer so I can show you exactly how to validate the model I trained using yolo V8
and you can see that these are many many different plots many different functions we are plotting a
lot of information but we are going to focus in the loss function and specifically we are going to
focus in this loss which is the loss function related to the pose so we are going to focus on the pose
loss related to the training set and the pose loss in the validation set and if we look at the training
set you can see that the loss is going down but not only is going down but I would say that's going
to continue going down for even more epochs right you can see that the trend is that it's going
down and it's going to keep going down for more iterations for more epochs we haven't reached
a plateau and I would say that we are very far away of any Plateau right so this is a very good
sign and it means the training process is going super well and it means that the model has
extra capacity it has more capacity so we could continue training this model and it will continue
learning more about this data that's what I take by looking at this pose loss in the training set
and if we look at exactly the same function but in the validation set we can see that it's going down
so that's a good thing but I have the impression that it's starting to be something like a plateau
right it's not very clear because it's happening right in the end of this
training process but you can see that it somehow seems like this is going to be a plateau from now
on right at the very least we can see that it's going down that's absolutely and 100% clear and
then it's unclear what will happen from now one or what would have happened if I would have trained
this model for more epochs but it seems we may have reached a plateau and that's something we need to
keep in mind in this validation process but now let's take a look at exactly how it's performing
with some images so the way I'm going to do it is like this I am going to open this image which is
the it's one of the batches in the validation set and these are our labels right these are not
our predictions but these are the labels the annotations now I'm going to keep this open and
I'm going to open exactly the same batch but our predictions right this is going to be a very good
way to analyze the results because now we have a lot of images and we need to make more conclusions
we need to take a look at more samples we need to... this is a much more complex problem in
my previous tutorials I showed you how to train an image classifier using yolo V8 an object
detector and an image segmentation model and I would say that today's model this keypoint detector is
much more complex than everything we did before so this validation process will be more complex as
well now if you look at all of these images I'm going to focus on only one of them I'm going to
focus on this dog on this Dalmatian and I'm going to show you exactly what's going on here let's
focus on this animal first and you can see that basically these are all of our key points
basically this is our ground truth these are all of our annotations and these are all our predictions
and you can see that it looks pretty well it looks very very well I would say that if we look at all
of these key points which are around the face I would say they are perfect I would say they are
very very good and then if we look at these keypoints over here you can see that it's very good
as well if we look at these three key points it's also very good these two key points over here as
well it's it's very good too and then if we look at the legs I can see something is going on here
because I don't really see these two key points so we are not detecting the legs and if I look at
the legs entirely I would say something is going on because I don't really see... I think we have an
issue in the legs and also you can see that this key point which is at the end of the tail we are
not detecting this key Point either so we have some issues we have an issue in the tail and we
have an issue around the legs but everything else I would say that is pretty good I don't know what
you think but I think it's pretty pretty good so this is one of the examples and now let me show
you another one which is in another batch again these are the annotations and these are the
predictions so let me show you what happens in this rhinoceros let me show you what happens
these are our annotations and these are our predictions and you can see that we have a similar
situation around the face I would say everything it's just okay we have like a very good detection
then over here we have very good detection too these three points are very well detected then
over here everything is okay and then we also have an issue around the legs right we are not
detecting all the key points in the legs properly the same happens over here with this other leg and
the same happens in the tail, this keypoint which is the end tail and then everything
else seems to be working super properly we are detecting all the key points but we have an issue
around the legs and around the tail and now let me show you other examples for example this one over
here you can see that in this case we have the animal in a different posture so it's a little
more challenging for the model and you can see that in this case we are detecting the face
very very well actually we're not detecting this eye but other than that all of the other key
points around the face are very well detected and you can see these three key points over here
everything is okay this one is okay this one too and then you can see that we have other key points
which are also very well detected but we have an issue again around the legs right and that's
pretty much what I noticed by looking at many of these examples right in many of these situations
we have many different situations because there are different animals they are in different
postures they are in different everything so we are going to notice different situations but
after inspecting a few of these images I had the impression that the model is performing very very
very well but we may have an issue around the legs and around the tail that was my impression
by analyzing many of these pictures so by combining this information all the information
we got by analyzing this images, all the keypoints, how they were detected and so on and
also combining everything with the loss function with this plot regarding the loss function in
the training set and in the validation set my conclusions from here will be to make a deeper
training to train this model for even more epochs and I'm curious to see what will happen in that
situation because if I look at the training loss I really like what I see I think this model have way
more capacity I think we could train for I don't know 50 more epochs 100 more epochs and I think we
will be in a very good situation the training loss will continue to go down and it will continue
to go down in this way right it seems we are very far away from the plateau but if I look at the
validation loss I'm not completely and absolutely sure what happens from now on so what
I would do now in order to improve these results or in order to try to improve these results will be
to continue training for more epochs and I would see what happens next I would see what happens
with this loss and then I would see what happens by analyzing these images again right that would
be my next step by analyzing all this information so this is a very good example of how to analyze
your model how to analyze your data and your plots and so on in a more complex example as this one
because remember that now we are trying to detect... now we are trying to learn something that's way
more complex as we did in our previous tutorials now we are not trying to learn like a bounding
box or a mask but we are trying to learn the entire structure of a quadruped so that's... trust
me it's way more complex than everything we have made so far so this is a very good example of how
to validate a model when the problem is a little more complex take a look at the loss function
take a look what's going on take a look in the training set and the validation set and also
take a look at some examples and then just make some conclusions in my case what I would do is to
train for more epochs and also please remember that we are always using the default values of
this training right we are training this model using all the default values the only values we are
specifying are the image size and the number of epochs and that's it and if I show you the entire
configuration file we are using is this one and you can see that we have many many many many many
many hyperparameters so another next step in case a deeper training is not enough another Next Step
would be to play around with the different hyper parameters and to find another combination of
the parameters which would be better for our use case that's very important because if you are
approaching a more complex problem like this one like the one I am doing right now I would say that
it's not very realistic to expect everything goes super super well from the first attempt by using
all the default values right I would say if the problem you are trying to solve it's
much more complex then most likely you would need to play around with the different hyper parameters
and you would need to find a combination of hyper parameters that suits well with your problem
with your project so that's what I can say about this validation process and now let me show
you something else which is where the weights are located within this folder you will see another
folder which is called weights and within weights you are going to see two files which are
best.pt and last.pt these are the modes you generated with this training process and this
is something that I have already mentioned in my previous tutorials but I'm going to say it
again last.pt is the model you trained at the end of your training process and best.pt means
that this is the best model you trained in the entire training process so you have these two
models and you can just choose the one you like the most and what I usually do is taking last.pt
I consider that this is a much more robust model so this is the one I usually consider when
I'm making my predictions and that's pretty much all I can say about this validation, about validating
this model and now it's time to make our predictions so let's get back to pycharm let me
show you this file which is called inference.py this is the file we are going to use in order to
make predictions with the model we just trained so let me show you how to do it I'm going to
start importing from ultralytics import YOLO ultralytics import YOLO and then
I'm going to Define my model path I am going to specify the location of the model
we just trained right which is this one in my case this is the location of my model I'm going
to select last.pt and I'm also going to set the path to an image, to an image I'm going to use
in order to show you how to make predictions my image will be located in samples wolf.jpg let me
show you super quickly the image I'm going to use in order to show you how to make predictions with
yolo V8 let's go to samples and this is exactly the image I am going to use you can see that
this is the image of a wolf which is obviously a quadruped so it's going to be an amazing image in
order to show you how to use this model now let's get back to pycharm and let's do something
like this I'm going to define my model like YOLO model path and then we're going to
say something like results equal to model image path and I'm going to select the first
element because as we are predicting only one image the first element will be just fine and then
it's just about iterating for result in results and this will be something like for keypoint
in result key points dot to list there and for now the only thing
I'm going to do is to print keypoints so we make sure everything
is okay and let's see what happens okay it seems I have an error and I think I know
what's the error to list goes without the underscore so let's see now okay now everything
seems to be okay and what I'm going to do now is I'm going to import cv2 because I am going to
read the image and I'm going to plot all the keypoints on top of this image right that's going to
be a very good way to show you how to predict all these key points so cv2 imread image path this
is going to be image and now I'm going to call CV2 dot put text maybe it's a very good idea to
put the text of each one of these key points to put the key Point number on top of each one of
these key points so this is going to be image then the key Point number will be... let's do it
like this keep point index key point in enumerate okay and then string key Point
index okay now the location and remember this is how my key points look
like we have three values and the values we care about in this moment are these
two because these are the X Y position of the key point so I'm just going to
do something like int keypoint zero and int keypoint one okay now I have to select
the font which I'm going to set in this one font cursey Simplex then the font size which I'm going
to set in one for now then the color something is not right let me see I think I'm not closing
these brackets I think that's reason one two okay there let's see now everything's okay okay now the color which I'm going to set in
green so this is something like this and then the text width which I'm going to
set in two and this is going to be all for now now let's see how it looks like right I'm
going to call cv2 imshow image and my image then cv2 wait key zero okay and that's pretty
much all what we are doing here is plotting the image and drawing all the key points on top
of this image with the key Point number on each key point right so it's going to be easier to
know exactly what we are detecting in the entire image and as a result everything looks pretty well
but I'm going to do something so I can improve the visualization I am going to make the font size
equal to 0.5 and I'm going to press play again okay now the visualization is a little better and
you can see that everything looks pretty pretty well right we are plotting all the key points on
top of our image and this is exactly how you can make predictions using YOLO V8 so the last thing
I'm going to show you is to open this file the class names and let's take a look at exactly what
we are detecting right so you can see here 0 we have the nose then upper jaw lower jaw mouth end
right mouth end left and so on right you can see that for example 21 we are somewhere around here
which is back middle it makes sense then 37 we are around here body middle right and then 36 belly
bottom so everything looks pretty pretty well and you can see that we are still getting the
issues we notice in the other pictures which is the legs are not very well detected and the
end tail is not very well detected either but everything else seems pretty pretty well so this
is going to be all for today if you enjoyed this video I invite you to click the like button and
I also invite you to subscribe to my channel my name is Felipe I'm a computer vision engineer
and these are exactly the type of videos I make in this channel this is going to be all for
today and see you on my next video [Music]