Train pose detection Yolov8 on custom data | Keypoint detection | Computer vision tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so this is exactly what you will be able to do  with today's tutorial in this video we're going   to work with pose detection using yolo V8 and  I'm going to show you the entire process from how   to annotate your custom data for free using a  computer vision annotation tool how to prepare   your data and your file system for training  this pose detector how to do the training in your   local computer and also from a Google collab and  how to do a super comprehensive evaluation of the   model you trained this is a much more complex  problem in my previous tutorials I showed you how   to train an image classifier using yolo V8 an  object detector and an image segmentation model   and I would say that today's model this keypoint  detector is much more complex than everything we   did before this is going to be an amazing tutorial  my name is Felipe welcome to my channel and now   let's get started and now let me show you the data  we are going to use on this tutorial we're going   to use the AWA pose dataset and let me show you  exactly how this data looks like so you can see   that these are pictures of many different animals  currently we are looking at antelopes these are   pictures of many different antelopes and if I  scroll down in this directory you are going to see   I also have other animals for example here this is  a bobcat which is some sort of feline some sort of   cat you can see that these are many different  pictures of this animal and if I scroll down   a little more you are going to see I also have  buffaloes so we also have pictures of buffaloes   and if I continue scrolling down you are going to  see other pictures of other animals for example   here I have a Chihuahua and you get the idea  right we have pictures of many many many different   animals and all these animals are quadrupeds  because this is a quadrupeds keypoint detection  dataset now let me show you the key points  we are going to be detecting for each one of   these animals and you can see that these are many  many different key points we have 39 key points   in total which is a lot and we are detecting many  different parts for example the nose the eyes the   jaw the tail the legs and also the ears the horns  or whatever they're called something like antlers   it doesn't matter we are detecting many many  different parts in these quadrupeds so this   is exactly the data we are going to be using today  I thought it was like a very very cool dataset to use   in pose detection and now let's continue so I'm  going to show you how to do the entire process   of training a pose detector using yolo V8 on your  custom data and in my case the data I am going to   use in this tutorial is already annotated right  so I already have the annotations for this data   but if you are training this pose detector on  your custom data then most likely you will need   to annotate the data yourself so I'm going to  show you how you can do that I'm going to show   you how to do the entire annotation process  using CVAT which is a very very popular and a   very awesome annotation tool for computer vision  and let me show you how to do it so I'm going to   cvat.ai this is CVAT website and I'm going  to click here where it says start using cvat   I'm going to show you how to create a project how  to create a task and how to do all the annotation   now I'm going to project and I'm going to  click the plus button I'm going to click   here and create new project and this is going  to be key Point detection this is going to be   quadruped key Point detection which is exactly  what we are going to be doing then add label   and I'm going to add quadruped continue  and that's pretty much all submit and open   this is where you are going to add absolutely  all the labels you have in your custom data in   my case I only have one label which is quadruped  now let's continue now I'm going to create a task   create new task the name of this task will be  something like quadruped key Point detection task zero zero one and I am going to add an  image I'm going to I'm going to show you how   to annotate this data with only one image  so I'm only going to select the first one   and then I'm going to click here in submit and  continue we have to wait a couple of minutes   until the data is uploaded into the server and  once everything is completed we need to go to   tasks this is our project and this is a task we  have just created and I'm going to click in open   so this is pretty much all now I'm going to  click here this is going to open the task and   now we need to start our annotation process so  you need to click here where it says draw   new points and you need to select the number of  points you are going to annotate in my case I'm   going to annotate 39 points but you need to select  as many points as you are going to annotate so now   I'm going to click here in shape and we need to  start our annotation process and something that's   very very very important is that once you are  annotating your data you need to follow a given   order right once you are annotating all of your  key points you need to follow a given order with   your key points if I show you this image again  you can see that we have many many different   key points we have the location of all the key  points but we don't really have any information   regarding the order of these key points right  this is very very important because you cannot   follow any random order you need to follow a  given order you need to follow always the same   order when you are annotating your data so this is  for example the order I am going to follow in this   tutorial you can see that the first key point  I'm going to annotate is nose then upper jaw   then lower jaw mouth end right and so on right  you need to specify a given order for your data   now I'm going to start this annotation process  so the first point is nose which I'm going to set   over here then the next one is upper jaw which  is going to be something like this lower jaw   here mouth end right and this is the right from  the perspective of this animal right so this is   going to be here now mouth and left and I don't  really see the mouth end left but I'm going to   say it's around here and I'm going to share a  few comments later on this tutorial regarding   the visibility of our key points right but for now  let's just continue now the next one is right   eye then right earbase which is here and then  right ear and which is over here and I'm just   going to continue with all of this list and I'm  going to resume this video when I'm completed   and these are the last two body middle right  which is around here and body middle left which is   around here I don't see it but is around here and  you can see that this is all these are my 39 key   points and now let me show you how you can export  this data but before, before please remember to   click save otherwise... it's always a good practice  to click save and not only you need the key points   but you also need to draw a bonding box around  your object this is very very very important and   I'm going to tell you why in a few minutes but for  now remember that not only you need to annotate   all of your key points but you also need to draw  a bonding box enclosing your object so this is   how I did it and I'm going to click save again  this is the only image I'm going to annotate   but please remember to follow exactly the same  process for all of your images I'm now going to   tasks and I'm going to show you how to export this  data you need to click here and Export task dataset   now you need to click here and you can see that  there are many many different options in which you   can export your data and one of these options is  coco key points 1.0 and this is very important   because this is the exact format we need for our  data but I have tried to export the data into this   format and it's not working for some reason it's  not working so I'm going to show you how to do it   in cvat for images 1.1 so click here then okay  and then you just have to wait until everything   is downloaded once everything is fully exported  you are going to see a file a zip file and within   this file there will be an another file called  annotations.xml now let me open this file so I   can show you how it looks like you are going to  see something like this and at the bottom of this   file you are going to see all of your annotations  and all the images you have annotated and its   annotations right so this is exactly the  data you are going to generate using cvat   now let me show you something else I have created  a python project for today's tutorial and let me   show you a script I created in this python project  and this script will be super super super useful   because now that you have your annotations now  that you have your data you need to convert   your annotations into the exact format you need  in order to use this pose detector using YOLO V8   so let me show you basically you need to specify  two variables one of them is the location of your   annotations.xml file and you also need to specify  the location of the directory where you want all   your data to be saved right this script is going  to parse through this XML file is going to parse   through this file and it's going to extract all  of your annotations and it's going to save all of   your annotations into the exact format you need  in order to use yolo V8 so remember to specify   these two paths these two variables one of them is  the location of your XML file and then where you   want all of your newly created annotations to be  saved right where you want this output directory   so once you have set these two variables the only  thing you need to do is to run this script and   everything will run super super smoothly and  remember this script will be available in The   github repository of today's tutorial so you  can just go ahead and use it in order to convert   all of your data into the format you need to use  yolo V8 and now let's continue now I'm going to   show you how you need to format how you need to  structure all of your data and your file system   so it complies with yolov8 so you can see that  this is a directory which is called data and this   is the root directory where my data is located you  need a directory which will be the root directory   where your data will be saved where your data  will be located within this root directory   you can see I have two folders one of them is  called images and the other one is called labels   it's very important that you name these two  folders exactly like this one of them should   be called images and the other one should be  called labels that's very important now if I   open one of these folders you can see I have two  other folders one of them is called train another   one is called val and it's very important that  you name these directories exactly like this one   of them should be called train and the other  one should be called val So within train is   where we will have all of our training data all of  our training images right you can see that these   are all of our images which are all the images  we are going to use as training data and within   val it's exactly the same these are all the images  we are going to use as validation data as our   validation set right so within images we have two  directories one of them is called train the other   one is called val and within each one of this  directories each one of these directories is   where we have all of our data all the data we are  going to use in order to train this model all the   images we are going to use in order to train  this model but we also have additional data which   are the labels now let me show you how this other  folder looks like you can see that within labels   we also have two directories which are also  called train and val and it's very important   that you name these two directories exactly like  this one of them should be called train and the   other one should be called val and if I open  the train directory you can see that we have   many many many txt files and these are basically  all of our labels for the training data for all of   our training images if I go back to images train  you can see that for absolutely every single one   of these images we have an annotation file right  for absolutely every single one of these images   we are going to have a txt file in this folder  and now let me show you for the other directory   for val it's exactly exactly the same but for the  validation data for the validation images right so   if I go back again you can see that we have   the root directory then images labels within   images we have two directories train and val and  within each one of these directories is where we   have all of our images and if we go to labels we  have also two directories train and val and within   each one of these directories is where we have all  of our labels so this is exactly how you need to   structure your file system and now let me show you  one of these annotations files one of these labels   files from the inside let me show you how they  look like so this is a random annotations file   this is a random txt file and this is exactly how you  need to put all the data inside these files the   annotations are specified in the Coco Key Point  format which is a very popular format for pose   detection now let me show you something I'm going  to do something which I'm obviously not going   to save the changes but this is going to be much  better in order to show you how these annotations   format works right how it looks like so basically  you can see the first number is a zero and this   is our class ID in my case I'm only I only have  one class which is quadrupled so in my case this   number will always be zero but if you are making  this project and you have many many different   classes please remember that this number should be  the class ID so if you have different classes you   will have different numbers here now the next four  numbers are the bounding box of your object right   remember in cvat when we were annotating this  data I showed you that not only we need to annotate   the key points but we also need to annotate the  bounding box right and we annotated the bounding   box so these four elements the four elements  that come after the class ID are the bounding   box right and this bounding box is specified in  the yolo format which is the X and Y position   of the center of the bonding box and then the  width and then the height of your bounding box   this is very important so this number these two  numbers are the X Y position of the center of your   bounding box and then the width and the height  and then all of the other numbers let me show   you, you can see that we have these two numbers  which are a float and then we have the number 2   and then we have exactly the same two numbers and  another 2 then two numbers and another 2 then   we have three zeros right this looks like very  very strange so now let's go back to my browser   because I want to show you this website which is  cocodataset.org and this is where we are going to   see exactly how this format works so if I go back  to key Point detection you can see that this is   our explanation about how this format works and  if I read something which is here you can see   that absolutely every single key point will be  specified as X and Y and a visibility flag V so   this means that for absolutely every single key  point we are going to have three values we are   going to have the X and Y position of that given  key point and we are also going to have another   value which is V which is the visibility right  remember we were going to talk about visibility   later on this tutorial this is later on  this tutorial so you can see that V has three   possible values V could be zero and this means  that the key point is not labeled and in this   case X and Y is going to be 0 too or V equal 1 and  this means the key point is labeled but it's not   visible or V could be 2 and this means the key  point is labeled and it's also visible and if we go   back to the to this file to the annotations you  can see that if we start over here we have two   numbers and then we have a 2 which means this key  point is annotated, is labeled, and is also visible   now if we continue you can see that we have two  numbers and then we have another two which means   this other key point is also visible now if we  continue you can see exactly the same two numbers   and then a two and then if we continue you can see  that this... we have three zeros and we are in this   situation right V equals zero so we also have x  and y equal to zero and this means the key point   is not labeled for this image right so long story  short after the bounding box all the other numbers   will be the key points and you will have two  values for the X and Y position and then the third   value will be the visibility of that given key  point now this is one of the possible formats in   which you could format your data and this is going  to work just fine but YOLO V8 also supports a key   Point annotation with only two values which means  that if you don't have the visibility information   for all of your key points then it doesn't matter  because yolo V8 also supports you input your key   points with only the X and Y coordinates so long  story short we have the first number which is the   class ID then we have four numbers which are the  bounding box and then all of the other numbers   are the key points and you can specify your key  points with three coordinates for every key point   which means we have the X and Y and also the  visibility for that key point or you can specify   all of your key points with only two coordinates  which means its the X are the Y coordinate of that   given key point so this is the way you need to  label your data is the way you need to structure   all of your annotations and please remember to  do it this way otherwise it's not going to work   so now I'm just going to press Ctrl z because  obviously I'm not going to save all of   those changes and that's pretty much all about how  to format your data how to format your file system   and how to put your data into the exact format  you need in order to train this pose detector   using yolov8 and now let's go back to pycharm  let's go back to the pycharm project I created for   today's tutorial and the first thing you need to  do if you want to train this pose detector using   yolo V8 is to install the Project's requirements  which is basically ultralytics so please remember   to install this package before starting with this  training because otherwise you will not be able   to train a pose detector using yolo v8 so once  you have installed ultralytics let's go back here   to this file I created which is train.py I'm going  to show you exactly what you need to code in this   file in order to do your training and in order  to do so let's go back here which is ultralytics   website and let's go to the pose page and let's  scroll down until this section over here and the   only thing I'm going to do is I'm going to copy  and paste this line and then I'm going to copy and   paste this other line right so this is basically  all we need to do in order to train this model and   obviously I need to import from ultralytics import  YOLO and that's pretty much all so this sentence   over here we can just leave it as it is we can  just leave it in this default value but this one   I am going to make a couple of changes I'm going  to change the number of epochs I'm going to train   for only one Epoch for now and I'm also going to  change the location of the configuration   file I'm going to use this file which is  config.yaml and now I'm going to show you how this   config.yaml looks like so you can see that this is  the configuration file I am going to use in this   tutorial we have three sections one of them for  data then key points and then classes and let's   go to the data section first this is where you're  going to specify all the locations to your data   to your images and your labels so basically you  need to specify the root directory the directory   containing your data which in my case is this one  remember the root directory this is the directory   which contains the images and the labels folders  and then you need to specify what's the location   of the training images and the validation images,  if you have made everything as I show you in this   tutorial as I showed you a few minutes ago then  you can just leave these two lines in these values   right you can just leave everything as it is and  everything will work just fine the only thing   you need to edit is the location of your root  directory now let's go to this section over here   which is the key points and we have two keywords  which are key Point shape and flip index and these   two keywords are completely and absolutely new for  us this is something we haven't seen before in any   of my previous tutorials about yolo V8 and you  can see that in the case of key Point shape in my   case it says 39 3. that's because I have 39 key  points and I'm using the X Y and V format right   I'm using three values for every single key point  so in my case this that's why I have a 3 over here   so this is how many key points you have in your  data and this is what format are you using if you   use you're using the X Y format in that case you  will need to specify a 2 or if you're using the   X Y and V format and in that case you will need  to set a 3 as I am doing over here so that's for   key Point shape and now let me explain what flip  index is and in order to further explain what this   keyword means I made a drawing over here where I'm  going to show you exactly what it means so you can   see that this is a random image in my data set and  actually this is the same image I used in order   to show you how The annotation process looks like  for this data and you can see this is a quadruped   with all of its key points drawn on top right  now let me show you what happens if I flip this   image horizontally right this is what I get you  can see that this is exactly the same image but   the only thing I did was to flip it horizontally  if I flip an image horizontally now everything   that used to be one of the sides now is the other  side right everything that used to be the right   side now is the left side and the other way around  everything that used to be the left side now it's   the right side that's only what happens when you  are flipping an image horizontally but remember   we had many many different key points and many of  these key points were related to one of the sides   for example we had a key point for the right eye we  also had key points for the right ear we had key   points for the right legs and the same situation  for the left eye the left ear and the left legs   right many of our key points are related to one of  the sides if we flip the image horizontally then   we should be doing something with all of these key  points which are related to one of the sides right   when we are training a model using yoloV8  when we are training this type of model one   of the steps one of the stages in this process  in the training process is to do something   which is called Data augmentation and this data  augmentation means that we are taking the data   and we are doing different Transformations with  this data one of the Transformations we are doing   is related to flipping the image right so we are  going to be flipping some of our images at random   and every time we are going to be doing an  horizontal flip we are going to have a situation   like this so now let's go back to this  list which is the list of all the different key   points we have in this data set right remember I  already showed you this list when I was annotating   this image and remember we start with the nose  then the upper jaw then the lower jaw and so on   so you can see that some of these key points are  related for example in this case to to the right   side this is related to the left side we have  many key points over here which are related   to the right side then we have many key points  which are related to the left side then we have   other key points which are not related to any of  the sides for example neck base neck end throat   back these are generic key points and they are  not related to any of the sides and we will need   to do something with all the key points which are  related to one of the sides for example these two   then all of these over here and so on right you  get the idea that's exactly what we need to do   and that's exactly what this flip index keyword  does right that's exactly the idea the intuition   behind this flip index so let's go through this  list you can see that the first element is nose   and if we think about a nose it's right in the  middle and nothing is going to happen when we   flip the image right the nose will continue  being the nose will remain as the nose and   then the next element is the upper jaw exactly  the same nothing will happen with the upper jaw   will remain being the upper jaw after we flip the  image horizontally the same will happen with the   lower jaw but when we get to this element this  is the mouth end right and we will have an issue   here here because the mouth end right when we flip  the image horizontally will be the mouth end left   and the next element which is mouth end left when  we flip the image horizontally now it will be the   mouth end right you get the idea these two values  these two key points will be flipped when we flip   our image right and now let's take a look at this  list we have over here which is the value for flip   index and you can see that the first element is  zero then one then two then four three right so   we are flipping these two values right instead of  having a 3 4 which will be like the natural order   we have a 4 3 we are flipping these two values  and these are exactly the indexes of these two key   points in the key Point order so long story short  the only thing we will need to do in order to fix   this issue we will have when we are flipping our  images horizontally the only thing we will need to   do is going through all of our key points and all of  the key points which are related to the right side   we need to flip them in order to make them the  left side right we only need to flip the right   and the left side that's the only thing we need  to do and that's what we need to specify here in   this list this is how the flipping will be done  so please be super super careful with this list   and this remember means how your indexes will be  flipped when the image is flipped horizontally   now let's move to this section and these are all  of your names of all of your objects in my case   I only have one object which is quadrupled so in  my case this is very simple but please remember   to specify all the names and all the class IDs  for absolutely all of your names in my case I   only have one in class ID which is zero and means  quadrupled so that's pretty much all for the   config.yaml and now we let's go back to train.py  and let's continue so once you have specified   this configuration file and you have specified  everything we have over here the only thing you   need to do is to execute this script and that's  all that's how easy it is to train this model but   I'm going to stop this training because otherwise  it's going to take a lot of time if I train this   model locally it's going to  take a lot of time I have been doing some tests   already and yeah it's going to take forever if I  do it locally but this is exactly the process you   should follow if you want to train this model  in your local environment but I mentioned that   I'm also going to show you how to train it in a  google collab so now let's go to my browser and   let's see exactly how we can do this training from  a Google collab the first thing you will need to   do is going to your Google Drive and you will need  to upload absolutely all of your data obviously   because otherwise you will not be able to train  this model from your Google Drive and also you   will need to upload your config.yaml file and  everything will be just exactly the same as the   file I showed you in my local computer but you will  need to edit this field which is the path right   you can see this path over here you will need to  edit this with the path to your data in Google   Drive this is very important and otherwise nothing is  going to work so please remember you need to edit   this path and I'm going to show you exactly how  to know what's the location of your data in your   Google Drive now let's go back to this Google  colab this is the Google colab I created for   this tutorial for this training and obviously you  will be able to find this notebook in the GitHub   repository of today's tutorial so for now just  follow along you can see that we have only   a few cells and the only thing I'm going to do  is to execute these cells one by one so I'm going   to start with the first one which is connecting  my Google colab environment with Google Drive   and now the only thing I have to do is to select  my account then I scroll all the way down and I   click allow and that's basically all we need to do  in order to connect our Google collab with Google   Drive so this way Google collab will be able to  access the data you have in your Google Drive   we have to wait a few seconds and that's pretty  much all you can see that everything has been   mounted here in content gdrive and now the  next step is to install Ultralytics right because   remember we are going to use ultralytics which is  the python package we need to use in order to use   yolo V8 and the only thing we need to do is to  execute this cell and everything is now completed   now in order to continue with the next two cells  you need to know where your data is located in   your Google Drive the only thing I'm going to do  is to execute this cell and this is going to list   absolutely all the files in  my Google Drive and in my root directory right   you can see these are many many many files and  from here the only thing I would need to do is to   find where my data is located in my case it's  located in this folder if I do an ls again   this is the content of this folder so the only  thing I need to do is to locate this directory   and then that's it right this is  the content of this directory which   is where my data is located so the only  thing I need to do is something like this   and that's the content of my data  directory which contains the two folders images   and labels now that you know where your data is  located in your Google Drive now you can just   copy and paste this path in your config  file right now that you know exactly where your   data is located you can just come here and  you can edit this field and you can just put   wherever your data is located and then you need  to specify the location of your config.yaml file   and once you set this location over here you are  all set and the only thing you need to do is to   click enter right you need to execute this  cell and that's going to be pretty much all   now everything is being executed this is going  to take some time the first thing is going to do is   download all the weights into this Google collab  environment and that's pretty much all it's going   to get all the data and then it's going to do  the training okay now the training process has   been completed and you can see that the results  have been saved here in run pose train4. so   I'm going to show you how to execute this cell so  we copy the entire content of the runs directory   into your Google Drive Right remember that the  idea of this training process is to download the   results to download the weights to download the entire  results which have been saved here and the way to   do it or one of the ways to do it I would say the  easiest way to do it is to copy everything into   Google Drive and then just download everything  from Google Drive so this is the simplest   way to do it and please remember to edit this path  to the path where you want everything to be copied   when you execute this cell so let me show you if  I go back to my Google Drive you can see this is   the runs directory which was just generated  and within this directory is where we have   this train4 which is the result of the training  process we have just executed so everything seems   to be okay and what we will need to do now is to  download this directory into our local computer   now remember this was a very very dummy training  this is a training we did for only one Epoch   obviously you will need to do like a deeper  training if you really want to train your pose   detector on your data one Epoch it's it's very  unlikely to be sufficient you will need to do   like a deeper training in my case I have already  trained the model with my data and I did it for   100 epochs right I did it before starting this  tutorial so everything is already trained and   we can just analyze the results so now let's move  to my local computer so I can show you exactly how   to validate the model I trained using yolo V8  and you can see that these are many many different   plots many different functions we are plotting a  lot of information but we are going to focus in   the loss function and specifically we are going to  focus in this loss which is the loss function related   to the pose so we are going to focus on the pose  loss related to the training set and the pose loss   in the validation set and if we look at the training  set you can see that the loss is going down but   not only is going down but I would say that's going  to continue going down for even more epochs right   you can see that the trend is that it's going  down and it's going to keep going down for more   iterations for more epochs we haven't reached  a plateau and I would say that we are very far   away of any Plateau right so this is a very good  sign and it means the training process is going   super well and it means that the model has  extra capacity it has more capacity so we could   continue training this model and it will continue  learning more about this data that's what I take   by looking at this pose loss in the training set  and if we look at exactly the same function but in   the validation set we can see that it's going down  so that's a good thing but I have the impression   that it's starting to be something like a plateau  right it's not very clear because it's happening   right in the end of this  training process but you can see that it somehow   seems like this is going to be a plateau from now  on right at the very least we can see that it's   going down that's absolutely and 100% clear and  then it's unclear what will happen from now one or   what would have happened if I would have trained  this model for more epochs but it seems we may have   reached a plateau and that's something we need to  keep in mind in this validation process but now   let's take a look at exactly how it's performing  with some images so the way I'm going to do it is   like this I am going to open this image which is  the it's one of the batches in the validation set   and these are our labels right these are not  our predictions but these are the labels the   annotations now I'm going to keep this open and  I'm going to open exactly the same batch but our   predictions right this is going to be a very good  way to analyze the results because now we have a   lot of images and we need to make more conclusions  we need to take a look at more samples we   need to... this is a much more complex problem in  my previous tutorials I showed you how to train   an image classifier using yolo V8 an object  detector and an image segmentation model and I would   say that today's model this keypoint detector is  much more complex than everything we did before so   this validation process will be more complex as  well now if you look at all of these images I'm   going to focus on only one of them I'm going to  focus on this dog on this Dalmatian and I'm going   to show you exactly what's going on here let's  focus on this animal first and you can see that   basically these are all of our key points basically this is our ground truth these are all of   our annotations and these are all our predictions  and you can see that it looks pretty well it looks   very very well I would say that if we look at all  of these key points which are around the face   I would say they are perfect I would say they are  very very good and then if we look at these   keypoints over here you can see that it's very good  as well if we look at these three key points it's   also very good these two key points over here as  well it's it's very good too and then if we look   at the legs I can see something is going on here  because I don't really see these two key points   so we are not detecting the legs and if I look at  the legs entirely I would say something is going on   because I don't really see... I think we have an  issue in the legs and also you can see that this   key point which is at the end of the tail we are  not detecting this key Point either so we   have some issues we have an issue in the tail and we  have an issue around the legs but everything else   I would say that is pretty good I don't know what  you think but I think it's pretty pretty good so   this is one of the examples and now let me show  you another one which is in another batch   again these are the annotations and these are the  predictions so let me show you what happens in   this rhinoceros let me show you what happens  these are our annotations and these are our   predictions and you can see that we have a similar  situation around the face I would say everything   it's just okay we have like a very good detection  then over here we have very good detection too   these three points are very well detected then  over here everything is okay and then we also   have an issue around the legs right we are not  detecting all the key points in the legs properly   the same happens over here with this other leg and  the same happens in the tail, this keypoint   which is the end tail and then everything  else seems to be working super properly we are   detecting all the key points but we have an issue  around the legs and around the tail and now let me   show you other examples for example this one over  here you can see that in this case we have the   animal in a different posture so it's a little  more challenging for the model and you can   see that in this case we are detecting the face  very very well actually we're not detecting this   eye but other than that all of the other key  points around the face are very well detected   and you can see these three key points over here  everything is okay this one is okay this one too   and then you can see that we have other key points  which are also very well detected but we have   an issue again around the legs right and that's  pretty much what I noticed by looking at many of   these examples right in many of these situations  we have many different situations because there   are different animals they are in different  postures they are in different everything so   we are going to notice different situations but  after inspecting a few of these images I had the   impression that the model is performing very very  very well but we may have an issue around the legs   and around the tail that was my impression  by analyzing many of these pictures so by   combining this information all the information  we got by analyzing this images, all the   keypoints, how they were detected and so on and  also combining everything with the loss function   with this plot regarding the loss function in  the training set and in the validation set my   conclusions from here will be to make a deeper  training to train this model for even more epochs   and I'm curious to see what will happen in that  situation because if I look at the training loss I   really like what I see I think this model have way  more capacity I think we could train for I don't   know 50 more epochs 100 more epochs and I think we  will be in a very good situation the training loss   will continue to go down and it will continue  to go down in this way right it seems we are very   far away from the plateau but if I look at the  validation loss I'm not completely and   absolutely sure what happens from now on so what  I would do now in order to improve these results or   in order to try to improve these results will be  to continue training for more epochs and I would   see what happens next I would see what happens  with this loss and then I would see what happens   by analyzing these images again right that would  be my next step by analyzing all this information   so this is a very good example of how to analyze  your model how to analyze your data and your plots   and so on in a more complex example as this one  because remember that now we are trying to detect...   now we are trying to learn something that's way  more complex as we did in our previous tutorials   now we are not trying to learn like a bounding  box or a mask but we are trying to learn the   entire structure of a quadruped so that's... trust  me it's way more complex than everything we have   made so far so this is a very good example of how  to validate a model when the problem is a little   more complex take a look at the loss function  take a look what's going on take a look in the   training set and the validation set and also  take a look at some examples and then just make   some conclusions in my case what I would do is to  train for more epochs and also please remember   that we are always using the default values of  this training right we are training this model   using all the default values the only values we are  specifying are the image size and the number of   epochs and that's it and if I show you the entire  configuration file we are using is this one and   you can see that we have many many many many many  many hyperparameters so another next step in case a   deeper training is not enough another Next Step  would be to play around with the different hyper   parameters and to find another combination of  the parameters which would be better for our   use case that's very important because if you are  approaching a more complex problem like this one   like the one I am doing right now I would say that  it's not very realistic to expect everything goes   super super well from the first attempt by using  all the default values right I would say    if the problem you are trying to solve it's  much more complex then most likely you would need   to play around with the different hyper parameters  and you would need to find a combination of hyper   parameters that suits well with your problem  with your project so that's what I can say   about this validation process and now let me show  you something else which is where the weights are   located within this folder you will see another  folder which is called weights and within weights   you are going to see two files which are  best.pt and last.pt these are the modes you   generated with this training process and this  is something that I have already mentioned in   my previous tutorials but I'm going to say it  again last.pt is the model you trained at the   end of your training process and best.pt means  that this is the best model you trained in the   entire training process so you have these two  models and you can just choose the one you like   the most and what I usually do is taking last.pt  I consider that this is a much more robust   model so this is the one I usually consider when  I'm making my predictions and that's pretty much   all I can say about this validation, about validating  this model and now it's time to make our   predictions so let's get back to pycharm let me  show you this file which is called inference.py   this is the file we are going to use in order to  make predictions with the model we just trained   so let me show you how to do it I'm going to  start importing from ultralytics import YOLO ultralytics import YOLO and then  I'm going to Define my model path   I am going to specify the location of the model  we just trained right which is this one in my   case this is the location of my model I'm going  to select last.pt and I'm also going to set   the path to an image, to an image I'm going to use  in order to show you how to make predictions my   image will be located in samples wolf.jpg let me  show you super quickly the image I'm going to use   in order to show you how to make predictions with  yolo V8 let's go to samples and this is exactly   the image I am going to use you can see that  this is the image of a wolf which is obviously a   quadruped so it's going to be an amazing image in  order to show you how to use this model now let's   get back to pycharm and let's do something  like this I'm going to define my model like   YOLO model path and then we're going to  say something like results equal to model   image path and I'm going to select the first  element because as we are predicting only one   image the first element will be just fine and then  it's just about iterating for result in results   and this will be something like for keypoint  in result key points dot to list there and for now the only thing  I'm going to do is to print keypoints   so we make sure everything  is okay and let's see what happens okay it seems I have an error and I think I know  what's the error to list goes without the   underscore so let's see now okay now everything  seems to be okay and what I'm going to do now   is I'm going to import cv2 because I am going to  read the image and I'm going to plot all the keypoints   on top of this image right that's going to  be a very good way to show you how to predict all   these key points so cv2 imread image path this  is going to be image and now I'm going to call   CV2 dot put text maybe it's a very good idea to  put the text of each one of these key points to   put the key Point number on top of each one of  these key points so this is going to be image   then the key Point number will be... let's do it  like this keep point index key point in enumerate okay and then string key Point  index okay now the location and remember this is how my key points look  like we have three values and the values   we care about in this moment are these  two because these are the X Y position   of the key point so I'm just going to  do something like int keypoint zero   and int keypoint one okay now I have to select  the font which I'm going to set in this one font   cursey Simplex then the font size which I'm going  to set in one for now then the color something is   not right let me see I think I'm not closing  these brackets I think that's reason one two okay there let's see now everything's okay okay now the color which I'm going to set in  green so this is something like this and   then the text width which I'm going to  set in two and this is going to be all for   now now let's see how it looks like right I'm  going to call cv2 imshow image and my image   then cv2 wait key zero okay and that's pretty  much all what we are doing here is plotting   the image and drawing all the key points on top  of this image with the key Point number on each   key point right so it's going to be easier to  know exactly what we are detecting in the entire   image and as a result everything looks pretty well  but I'm going to do something so I can improve the   visualization I am going to make the font size  equal to 0.5 and I'm going to press play again   okay now the visualization is a little better and  you can see that everything looks pretty pretty   well right we are plotting all the key points on  top of our image and this is exactly how you can   make predictions using YOLO V8 so the last thing  I'm going to show you is to open this file the   class names and let's take a look at exactly what  we are detecting right so you can see here 0 we   have the nose then upper jaw lower jaw mouth end  right mouth end left and so on right you can see   that for example 21 we are somewhere around here  which is back middle it makes sense then 37 we are   around here body middle right and then 36 belly  bottom so everything looks pretty pretty well   and you can see that we are still getting the  issues we notice in the other pictures which   is the legs are not very well detected and the  end tail is not very well detected either but   everything else seems pretty pretty well so this  is going to be all for today if you enjoyed this   video I invite you to click the like button and  I also invite you to subscribe to my channel my   name is Felipe I'm a computer vision engineer  and these are exactly the type of videos I make   in this channel this is going to be all for  today and see you on my next video [Music]
Info
Channel: Computer vision engineer
Views: 21,141
Rating: undefined out of 5
Keywords:
Id: gA5N54IO1ko
Channel Id: undefined
Length: 52min 26sec (3146 seconds)
Published: Wed Apr 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.