Tennis Shots Identification and Counting using YOLOv7 Pose Estimation and LSTM Model

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone in this video tutorial we will see how we can do tennis shorts identification and Counting using yellow V7 pose estimation so this whole project can be divided into multiple steps and those steps are given below so in the first step we will download the yozo YOLO V7 or the estimation wears so we will be using yellow V7 Force estimation widgets for the key Point predictions in the next step we will also download the SF pro.edf font when we will need this font when we try to display the total short type and total shortcut short count and the short type in our UI so we will using SF Pro dot ttf font when we display the short type when total short Bound in our user interface then I will create a virtual display in Google collab so why I need to create a virtual display in Google app because if I want to show the output video demo in the Google Bowl app therefore I need to display a which world is create a virtual display in Google collab next I will download a train trained LST model from the drive basically I have trained the LST model and I have saved the uh the weights of the LST model into my drive so I will download the H5 file of the LST model from the drive so we have already trained the LST model on the backhand and forehand ground stroke data in the fourth step I will download some sample videos from the Google Drive for testing on the print LSK Model 410 is short identification in the fifth step I will download a 10 shot identification and Counting strip from the Google Drive so in while testing we basically pass an input video to the Daniel short identification and Counting strip so what does in happen in the next step is that the input video is passed the YOLO V7 for the estimation model and the key Point predictions of the body are obtained so just add here yellow V7 over the estimation model and and the key points of predictions of the body are obtained and then the key Point predictions are passed to the trained lsta model for the short prediction like whether it is a forehand ground stroke or the back end ground stroke so I will also have to help you or teach you how you can train your LST model on your own data if you want to detect any other shot as well so you how you can train the LST model for that shot as well or for any other data you for the cricket shot data so I will tell you how you can train Dynasty model for any kind of data as per your requirements so in the first step we will download the tennis short identification files from the Google Drive so I have already downloaded this file so just let me disconnect and delete this runtime and let's start from the sketch so okay so just let me start it okay I'm just correcting it so before running the script please make sure that you have selected the runtime as GPU and click on save so Hardware accelerator should be selected as GPU then click on Save and then you need to download the tennis order notification files from the Google Drive so it will be downloaded in the zip format which you can see over here but we will unzip it so this file will contain the point file srpro.ttf point plus it will also get contain the plots file as well so let me show you here so you can see that we have the YOLO V7 pose estimation weights in that file in that folder we have the SF Pro dot EDF point over here we have the plots file over here which will help us to plot the skeleton plots on the body okay so then we need to import the required Library so we need to the numpy library in this case when we try to convert our list into the numpy array so for that purpose we are using the numpy library then over here we are creating a virtual display uh on the so in the Google app so that we can show our output video demo into the Google collab notebook so just creating a virtual display into the Google app so here I am updating the pillow Library basically pillow library is used when we try to create a rounded rectangle like rounded Mountain box because if you see that we have a square rectangle bounding box when the object is directed but if you want to create a ground detect angle bounding box when the object is detected then we use the pillow Library so that's why I'm upgrading the pillow Library over here then we are downloading the trained lstm model file from the drive so the this model uh M model dot final dot H5 so basically the LST model is trained for the 10 uh for the back end ground stroke and for the forehand ground stroke okay so here I am just downloading some sample videos to test um to test my trained LST model so I'm just downloading a sample videos from a Google Drive so basically I have uh get got this sample videos from YouTube and from the pixel website so I've just downloaded those uh videos from those sites and just place these videos on the drive so I can download this video from drive directly into my Google Colac notebook okay so now we download the Coco dot names files and the updated plots file okay so now we will also download a tennis short education and Counting script so when we run this script so it will do the 10 shot identification as well as the counting as well okay so let me just run this script on a demo video and we are selecting that devices Z to zero we are basically using GPU and this is the name of our video file on which we are testing running this script tennis order information and Counting so I will uh also explain the script after this as well so let me first run this uh script and see and shown you the demo output and then I will try to explain the whole process as well okay so all the processing is done by frame by frame like you can see that uh all the processing is being done frame by frame one by one okay so it's 29 frames okay let me show you one thing just over here so you can see that after 30 frames like it's 29 in the 30th frame we got this sequence of and this is the prediction of the front stroke like the currently the payload player is playing the forehand ground stroke okay so this is the predictions for the current stroke what is the current stroke the player is playing and this is a sequence of the body key points like is the body is moving towards playing the back end round stroke or the forehand ground stroke so this is the body key Point estimation or predictions you can see that and this is the stroke the player is currently playing okay so now after every 30 frames you get this prediction so if you can see here after 59 for frames we get this prediction on the 60th frame the sequence or the key Point predictions when the body is playing when the player is playing the forehand ground stroke and thus uh stroke name so okay so this is the crunch stroke name which is the forehand round stroke and in this list on the Rogue names of the previous and the current stroke is saved in this list which we have created over here okay so in this way all the predictions are done after every 30 frames like you can see now it's 89 frame and here we have the prediction that currently the player is playing the backhand ground stroke and here is the the body key Point estimation or key Point predictions of the body that the player is playing this uh backhand ground stroke and you can see over here and this short name is also saved in this list as well okay which contains the these are these two are the previous short names and this is the current short name the last name in the list is basic the friend short name okay and this is the key Point estimation of the or the key Point prediction of the body when the player is playing that specific shot so it will be done after every 30 frames so let's wait for this video to completely process and then I will show you the output demo video as well well guys we have processed the complete video and you can see here uh the predictions are done after every 30 frames okay so here is the current uh stroke which the player is playing like backhand ground stroke and in this list we have all the uh previous Strokes in this list all the previous Strokes are saved over here which you can see so I have displaced this demo output video and now to our Google collab notebook as well so this is the demo output demo video into our Google call app notebook let me download this video and show you uh how it works okay so just let me let's play this output demo video and see how our results look like okay so just uh let me navigate my screen towards this output ammo video okay so I think now you can see that so this is the output demo of meteor so let me tell you one thing we are only detecting the short type for this player one okay we are only detecting the short type for this player one we are not detecting the short type for this player okay so so for this player we are not detecting the short type we are only detecting the short type for this player uh one which has the bounding box and the label as person so we are here detecting the short type as forehand ground stroke and the total shout count here is appearing as well so let me play this output video so you can see that the currently the player has played the back and ground stroke like you can see over here like this I I am only detecting the short type and the total short count for this layer only okay so now you can see that currently the player is playing the back and ground stroke so you can see over here it's appearing back and ground stroke and as you can also see that here we get an increment as well from two to three that the player is playing the back and ground stroke okay so we are why we are not detecting for this player like you can say why we are not detecting the short type and the total shotgun for this player like the player do because if you see over here this player in some cases we are not getting the poses for this player like the pose estimation or the key points of the Body for this player while this player is close to the camera and we are getting the correct key points of the body you can see over here while we are not getting this player key points in each game like you can say they see that in this front frame we are missing the key points of the body of this player because it is far away from camera it is far away from camera so instead of our model giving wrong predictions for this player we are only focusing the short type and the total short count for this player only for this player we are only interested in interacting that short type and the total short short count for this player only for this only for this player okay so we are only detecting the short type and the total shotgun for this player and we are not interested for this player okay so you can see that uh the player is playing the forehand ground stroke so our model is directing correctly that thickness player is playing the forehand ground stroke you can see over here the player is playing the forehand ground stroke and now his poses towards the back end round stroke like so here you can see that the modern has directly dubbed up there is playing the backhand ground stroke okay so this is how our output video looks like let me go back to the code and and then we can discuss further over here so so uh so let's see so here you can see that we have trained the lstm order to get the short short like with the level that the player is playing the back end round stroke or the player is playing the forehand ground stroke so we have trained the LST model for this purpose so let's look at the complete process how to train our lstm model so this is the complete process on how to train our LST model in the first step the data set of different tennis videos was gathered from YouTube so let me show you I have downloaded the tennis videos data from YouTube and save into my local directory so let me just show you okay so if I go to over here and just give me a minute I am just navigating the screen towards the uh towards the files where I have so let me just navigate my screen so if you see this data set folder on your screen so you can see that I have downloaded different tennis videos from a YouTube so these are the different tennis videos which I have downloaded from YouTube I will share this data set as well with you okay so then I have let me show you then if you see that I have a separated uh these videos uh like I have trimmed these videos for the back and ground stroke and for the forehand ground strokes so like you can see that I have already trimmed these videos from the back and ground stroke you can see over here then I have also trimmed this videos for the forehand ground stroke which you can see over here okay so I have trimmed these videos when the player is playing the back end round stroke and when the player is playing the forehand ground stroke and I have saved those videos into the separate folder okay so let's go back to the code so in the first step the data set of different tennis videos was gathered from YouTube so I have shown you that I have gathered different tennis videos data set from YouTube in the Second Step the time span when the player is playing the back and ground stroke or the forehand ground stroke are trimmed and saved into the separate folder so I have shown you as well that I have saved the backhand ground stroke videos and the forehand ground stroke videos in the separate folder in the third step each video frame was sent into the YOLO V7 pose estimation model and predicted key Point LED marks x coordinate y continent and the confidence were expected and stacked together as a sequence of 30 frames okay so let me show you this third step so here I am just uh downloading the key points finder dot Pi script so this find the key points of the body when the player is playing a specific shot okay so here I've just pass up video One Dot MP4 so this is the video of the player when the player is playing the uh you can say backhand ground stroke okay so this is the video in the video one we have only have the video in which the player is playing the back and round stroke so in video One the player is only playing the backhand ground strokes basically the video one is the trimmed video from the original video in which the player plays the back and ground stroke okay so when we run the key Point finder script like here so you can see that we got this key points of the body after every 30 frames like you can see that we got the key points of the body after every 30 frames when the player is playing the back and ground stroke okay so in this list uh this is the uh key points of the Body for the first 30 frames and in this list like you can see here uh like this is the key points of the body after 60 frames when the player is playing the back and ground stroke and in this below list uh in we have all the key points taped from first one to 30 frames and the 30 to 60 frames so in this list we have all the key points frame save saved from 1 to 30 and 30 to 16 frames okay so basically after we get the key points from here we just uh like you can see that this is the key points of the body from uh okay so this is the key points of the body which we got from the whole video okay this contains this is a very long list so we just save this list into the uh like you can say in the form of numpy array so you can see the shape of the numpy array so in this current video video one we have 10 sequences 10 different sequencies which has a length of 30 frames okay which basically in which which other which contains which are appended with 30 frames okay so and the length of the uh that which and the sequences which contain 30 frames are the length is 10 and 51 represents the key points of the body like 17 key points are the four x coordinate 17 key points are for the y coordinate and 17 keep are the confidence values okay if I show you the output demo video over here just setting my okay so you can see that this contains only the key proper body when the player is only playing the backhand ground stroke okay so in this case the player is only playing the backhand round stroke you can see over here okay so we have only uh found out the key points of the body when the player is playing the back and round stroke and we have saved those key points in the form of the array okay so and that shape of the array you can see over here okay so in this way we have uh we are running multiple videos and finding the key points of the body when the player is playing the back and ground stroke you can see that we are not going for video two and now we are doing for video three we are doing for video 4 so we have we are basically collecting data of the key points of the body when the player is playing the back and ground stroke okay so we have collected data from 70 videos and okay seven videos when the player is playing the back and ground stroke and we are saving this data in the form of our array okay now we are collecting the data when the player is playing the forehand ground stroke so now we are just passing different videos when the player is playing the four and ground stroke and we are collecting the data and converting it into form of numpy array okay so these are very much so you if you see it you will get the understanding as well how we are basically doing it so let me show you finally what data we have so just give me a minute okay okay so okay so here we have the uh so final background ground stroke uh data contains the data which we have corrected when the player is playing the back end ground stroke and final forehand ground stroke is the data which we have producted when the player is playing the forehand ground stroke and the total data shape is 203 30. and 51 okay so now we are importing the required libraries to train the nsta model so here we have imported all the required libraries which are we used to train the lstm order okay so here we can see over here we are doing the train and test blade we are doing using 80 of the data for the training and 20 of the data for the testing purpose and here we are just training so here is our LST model we have we have uh four layers of lstm and two layer three layers of dense layers and we are using sequential method method to implement the lstm model and here we are training our LST model which you can see over here I have already trained the model for 500 epochs you can see over here and we are getting uh accurate categorical accuracy of 98.77 for the training and the validation accuracy is 73.17 which is very good okay so after training the model I am just saving the model in this H5 format at dot H5 format it so in the above script I have told you that I am downloading the model from the Google Drive so after saving this model in the dot H5 format I download this model over here and place this model into my Google Drive so when I need to download this model into the Google collab notebook I directly download the model from the Google drive into my Google call app notebook okay so this is the whole process let me try to explain you that 10 is short identification and prediction script so if you know the tennis short and indication and prediction step then key points finder script will be a no worry for you okay so here we have the tennis shot and definition accounts.pi on which we run our demo video so let me just try to explain you this uh script okay just set this as well so let me just open this script over here which is this I have already added the command so if you just check documents so it will be very easy for you to understand so I'm just opening the script over here and I will explain you the complete script as well so just give me a minute okay so I'm just expanding it a bit on the left side so in the first step we are importing all the required libraries which you can see over here so here I am importing all the report libraries then I am creating a empty dictionary by the name object counter so here I've written creating an empty dictionary to save that to save the total short count of each short type so it's just writing it to save the total short count of each type of stroke the first element in that dictionary is the key which contains the short type for example if the player is playing the back and ground stroke or the forehand ground stroke the second element in the dictionary is the value which contains the short count of each type of the stroke for example uh thus in the value section we have the short count for each of the slope for example forehand grounds work is played by the player 10 times while the back and ground stroke is played by the player four times so in this direction it will contain the total shout count and the short name as well okay then I I'm just creating a function by the name load Dash classes you can see here using this function we will load the coco.names file and reach read each of the object name so in the coco.names file and this function uh and this function which this function load Dash classes basically done all the object names in the coco.names file in the form of list so uh all the object names like we have in the copper.names file will be returned by this function in the form of a list okay so now this is our main function bra so here we are just passing the YOLO V7 pose estimation weights by default we are setting the device type as CPU but we are using GPU when we do the predictions and here is our Coco dot names file over here and here I am defining the thickness like the thickness of the skeleton points of the body let me show you what I mean by the thickness okay so just okay let me explain this to you okay so by thickness I'm sorry okay so by thickness I mean this okay so I mean this thickness of these key points so okay so these line I mean the thickness of these lines so I'm going this set is two but if you increase this to four or five these lines will become further thick okay I am talking about these lines which are yellow blue okay so let's go back to the code again so okay so here we are just setting the plus scenes the cap camera capture and passing the video path and defining the site and size and size height and width of the output video and here I'm just loading the srpro.ttf font which we have uh downloaded so let me show you where we are using the subpro.tf fund so here you can see that these uh this text is written in the srpro.ttf font this text is written in the srpro.ttf1 this text is written in the asset product ttf font this text is also written in the SF product ttf font this text is also written in the SF Pro dot ttf font okay so let's go back towards the code again okay so here we are just loading our trained okay short identification model so here we are just loading our trained short identification model I we have saved the H5 model in the form of dot H5 format so here we are just loading the uh trained short identification model over here which we have trained with LSU this is the basically trained LST model okay which I explained you below so this is a trained LST model file okay so here we are just creating the empty list and here we are defining The Strokes so we have two Striker Strokes which we are detecting we have we are detecting back-end ground stroke and the forehand ground stroke as well okay so in the first step we are we will do for the player two so in the first step uh we will find the key points of the Body for the player two so let me show you let me share this my screen so just give me a minute so so if you see my screen this is our player one the this is our player one and this is our player two okay I've written as well this is our player two and this is our player number one okay okay let me just go over here so first we will do find the key points of the Body for player two so here we are just creating a black mask and then we are defining the ROI for the player 2 so Roi means region of interest for the player two so what does it means so it means basically we are defining this key coordinates okay so in this to find the ROI for the player two we create a standard tangle and uh for that use these coordinates which are X1 y1 and x 2 Y 2 okay so how we get this point please Focus over here when I try to explain please focus on this so if I go over here now you can see that the current uh coordinates are of X1 and Y 1 are 141 and 164. so if you focus over here if I go towards this so current coordinates of X2 and Y2 are 1055 and 356 okay so if I go towards the code So and I've just passed the X1 y1 coordinates and here I've just passed the X2 Y2 coordinates for the player 2 okay okay I said so uh sorry I am just talking about display and this is the player one so currently we are using personal coordinates for the player two okay so these are the X1 Y Bar coordinates for the player two and these are the X2 and Y2 coordinates for the player two so if you go over here we see this point so currently these are the 73 c71 and it is uh currently one thousand twelve hundred and six fifty Seven okay so let me go back towards the code again So currently we are finding uh we will first we will find for the player number two Okay so we are finding a region of interest for the player 2 and then we will calculate the skeleton points for the player two so now here we are finding the key points on the Scandal points for the player number one as I've shown you the player number one is above the plug there okay so here we are just creating an uh Black Mask over here and here we are just finding the region of Interest what the player one we are past the X1 y1 coordinates and here we have passed the X2 Y2 coordinates okay over here okay so in the next step you can see over here if we go below so here we are just plotting the skeleton points for the player one over here and if I glow blow so if I go below I am I will be plotting the skeleton points for the player 2 as well okay so here I'm just plotting the skeleton points if I go below over here I'm just no okay I'm just so here I am just plotting the skeleton points for the player 2 over here okay so and okay over here you can see that so here what we are doing is that if the length so as we are only detecting a short type and a total short count for the uh only the player number for only this player okay so as we are detecting only the short type and the short count for only display number two okay so what we are doing is that okay let's go let me go back to the code okay so as we are detecting only the short type and Shout count for the player too so as we process the video frame by frames okay so we what we say over here is that okay let me go further so when the sequence length is equal to 30 like we made prediction after every 30 sequences or after every 30 frames okay basically we do the prediction of the stroke the player is playing after every 30 frames so after if the length of the sequence like uh if the 30 frame have passed if the 30 frames have passed then we do the prediction like what the currently the shot of the player is playing so TF dot model dot predict will do the prediction what currently the shot is playing after every 30 frames like you can see over here and then using in the post name we have to find the poses backhand ground stroke and the forehand ground stroke so here we got the stroke name or the post name from here okay so then we will save all the sequences in the form of in the key points list over here okay and we will save the poses names as well so now here we have created object counter so we are saying that if the short name is not in the object counter dictionary then add the name okay so if the short name is not in the object contradictionary then add the name if the short name is already there in the object country then just increment the counter so we are just incrementing the counter over here okay so when the value of JM becomes equal equal to the value of sequence which is 30 which so and when the value of J becomes equal to the value of sequence which we have defined as 30 then remove all the previous values of sequence and set it as empty list and start processing the next frame so we have when the value of sequence we have to find the value of sequence over here as 30 let me show you so we have to find the value of sequence as 30 over here okay so often J is the value so basically J counts the frames okay in the J value we have the number of frames Value store so sequence we have defined 30 like after every 30 frames we will do the prediction okay so in JF store the value like how many frames have passed okay so when the J value becomes equal to the sequence value which is 30 like J when JJ is 30 it means the 30 frames have pass okay then we will make the prediction and also we make a sequence list empty and J is equal to 1 like J is will start from 1 again okay so like I have shown you that after every 30 frames have passed let me show you okay let me show you like after every 30 frames are passed we usually make the predictions like what uh the current stroke the player is playing like if it's playing the forehand ground stroke or the back end round stroke okay so here we are displaying the back and ground stroke over here and uh so if we have the interplayer is currently playing the back and ground stroke we are just setting these values like where we want to display it in the UI and if the player is playing the phone and ground stroke we are just adjusting the UI where we want to adjust uh set these uh values or the line in the UI and we will want to put the text like the currently the player is playing the forehand ground stroke okay and here we are just part defining the total short count like where we want to uh display in the UI like where the total short point in the UI appears so let me show you so basically in this all this below we are just setting this like in where in the UI I want to display this total shortfound and where in the UI I want to display this tutorial I want to display this short type okay so we are just adjusting this short type and the total shortcount in the UI using this okay and this is all from the code and see you all in the next video tutorial till then bye bye

Info

Channel: Muhammad Moin

Views: 4,929

Rating: undefined out of 5

Keywords: yolo, yolov7, object detection, pose estimation, deep learning, computer vision, opencv, pytorch, lstm, LSTM, keypoints

Id: XRJpiNaPbMg

Channel Id: undefined

Length: 32min 21sec (1941 seconds)

Published: Tue Jun 06 2023