Football Players Pose Estimation | YOLOv7 | Google Colab | step-by-step Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Applause] FIFA World Cup is almost done three more games to go but the whole tournament gave me a lot of motivation to build projects showing how you can use computer vision in sports so last week I built football players tracking and you can watch the video showing how to do that on roboflow channel the card is visible in the top right corner right now and today on my own channel we will create a football post detection algorithm the first step is to gather the data so we drove with Chris and record the team juggling the ball with two cameras at the same time at different angles we then manually synchronize the recordings to make sure that we are looking at the same moment in time we use two different computer vision models to process the videos the first one was latest Yola V7 post detection algorithm to extract the key points the second one was regular object detection track the ball after that is done all we really needed to do is spend 10 more hours to clean up the data calibrate everything and create the output visualization [Applause] easy enough of the talking let's jump into the code and let me show you how it's done but before we will do that make sure to like And subscribe this is actually my first video on the channel so I would really appreciate your help let's go I divided the work into two separate Jupiter notebooks the first one is all about feature extraction so we will run two computer vision models like I said yellow V7 in object detection and post detection mode and we'll save the results into the Json file the second one will pick up those files and focus mostly on data cleaning calibration and creating the final visualization so as usual we'll start by making sure that we have access to the GPU the Nvidia SMI command return the output so we see that we have access to Tesla T4 that should be enough for us the next step is to confirm versions of computer vision libraries that we'll use so in our case it's mostly torch and Cuda everything is fine so we can start to work on our project if you have problems with Nvidia SMI that most likely means that you don't have access to the GPU so you need to enable that you do it by going to run time change runtime type and over here change from none to GPU that's pretty easy and let me increase the font size so you would be able to follow the links to both jupyter notebooks are of course in the description below now we're ready to go the first step is to install YOLO V7 in our python runtime so we do it by executing the first few commands so that will pull the repository installed or required dependencies and add the directory into python path unfortunately the yellow V7 don't have peep package so we need to hack a little bit now we are ready to download the videos like I said in the intro we recorded with two cameras at the same time I manually synchronize the videos and store them on a D drive so we can download them right now the files are around 200 megabytes each so it shouldn't take a lot of time to do that cool I created few utils to generate frames from the video so you can right now test if everything work and sure we are able to pick the first frame from the video now we are ready to load the model into the memory and the first step that we need to accomplish is to pull the weight from GitHub repository and after that is done we should be able to first of all create the pytorch device and later load object detection model and post detection Model 12 seconds later now we can test if everything works by inferring on a single frame so unfortunately we need to go through a few extra steps like I said Yola V7 don't have any peep package so we need to essentially go inside yellow V7 code load few utilities from their repository and glue stuff together so that we can use the model in a custom way so I went into the repository and tried to understand and how they load images before they actually fetch them into the model and prepared few utilities to pre-process frame for object detection and for post detection I know I know I talk a lot about YouTube but I'm still not finished we loaded pre-processing utils but we also need post-processing utils so typically neural networks return tensors that require a little bit of cleanup for the output to be useful so similarly we need separate functions for object detection and for post detection and now we can finally infer on a single frame with both models we see that object detection model noticed two objects on the scene a person and a ball and on the same frame post detection model detected the whole silhouette of the person having all those tools in place we can now proceed to process both videos and thanks to the magic of Cinema we can now take a look at the results the whole processing takes a little bit of time as we run two models simultaneously I was a able to achieve around two maybe three frames per second on that weak GPU on Google collab that low FPS is also the result of me loading quite heavy models for that project the speed of processing was not my priority on the other side I really wanted to get as good quality of predictions as possible and a lost key Point basically results in Breaking the animation and I really wanted to avoid that like I said before we dumped all results into Json files so that we can later load them into second Google call up second Google collapse seems shorter but it actually took me a lot more time to develop it we start by downloading the Json files similarly like with the videos before I host them on a G drive so you would be able to download them now that we loaded all the extracted data into the memory let's display the representation of the first frame from the video I use matplotlib to display the scatter plot representing the key points of the silhouette and one of the first thing that you you should notice is that the direction of the y-axis is different for yellow V7 and for matplotlip the silhouette is basically upside down we can now draw similar scatter plot but for the second video similar story here the silhouette is upside down but what I wanted to show you is that the size of the silhouette is different on first video and on the second video so we can see that the silhouette on the first video is around 800 pixels high and on the second video is around 600 pixels High we will need to calibrate both of the videos so that later on we would be able to combine those Silhouettes into one 3D model the next stage is exactly that calibration of the Silhouettes I will not go through all the math over here but what we are doing is mainly two things flipping the image so that the head is higher than the feet and calibrating the size of the whole silhouette so that the representation from the first video and from the second video would have the same height that is set to 1000 pixels here are the results of the calibration I guess the last thing that I didn't mention is that we needed to calibrate the ball position along with the whole silhouette so that the interaction between the ball and the player would be natural and finally the time has come for the main event of the video the 3D model I really spent absolutely unreasonable amount of time and effort to create that but I think it turned quite well so let's see it this is single frame from the animation and it's created by the draw 3D method and you can see that you basically can pass the ID of the frame from the whole video as well as the angle of rotation and it creates the representation of what we saw in the videos now I can change the index and change the angle and I will get completely different View and of course the last part is to run the script for every frame you use ffmpg to combine that into the video and here is the final result I added the trace of the path of the ball and the feet for extra point in artistic category that's actually pretty easy to implement using python DQ and the Final Touch is basically rotating the whole scene with every frame great thanks to Chris for participating in the video it really wouldn't happen without him and oh boy I have a lot of respect for his football freestyle skills hopefully you found that video useful and entertaining and if so please hit the like And subscribe button I have a lot of cool ideas for the next videos related to computer vision make sure to follow me on GitHub and twitch my name was Peter bye [Music]
Info
Channel: SkalskiP
Views: 16,147
Rating: undefined out of 5
Keywords:
Id: AWjKfjDGiYE
Channel Id: undefined
Length: 9min 46sec (586 seconds)
Published: Wed Dec 14 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.