Football Players Tracking | YOLOv5 + ByteTRACK | Google Colab | step-by-step Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
FIFA World Cup knockout stage is currently in progress my team of course already gone see you in four years hopefully but I still decided a great opportunity to create a piece of content that is related to that event and Keel to birds with one stone by answering your requests for legit tracking tutorial so as you saw in the intro what we'll be doing today is tracking players on a football field and yes I decided to dress accordingly I know it is the club Jersey no the national Jersey but this is the only one I have so this is actually a pretty complicated task to accomplish because we need to go through multiple stages the first stage is object detection we created a separate video about it that you can watch by clicking the card visible in top right corner right now but long story short it's about saying what do I see on the image and where do I see it we usually use bounding box to determine the location of the object and class name to determine the type of object next stage is tracking tracking is basically about looking at subsequent frames of the video and saying that the object that I see on the first frame is actually the same that I see on the next frame there are different types of trackers some of them use neural networks some of them just plain math and calculating intersection over the union between the frames regardless of the type the result is usually the same unique object ID assigned to every bounding box when those two first stages are done you can basically unleash our creativity because we have so much data to work with so in this tutorial real I use the location of ball and players to calculate the proximity and assess which player is currently in ball possession as well as creating custom annotators that add a lot of character into the final video enough of the talking let's jump into the Jupiter notebook where I will show you each of those steps as well as pretty cool tricks that you can use in your computer vision project and before we start make sure to like And subscribe it really helps the channel Aladdin gives me a lot of motivation to create those crazy tutorials for you I really slept like three hours today as you most likely know from the thumbnail and from the title of the video we will be using yellow V5 and byte track combo we of course start as usual by confirming that we have access to the GPU by the way the link to the Jupiter notebook will be in the description below and of course we need to increase the font size so that you would be able to follow what I'm doing over here and now we are ready to download the football game videos that we'll be using in our tutorial the data set is divided into three directories Clips test and train a train contains like very long videos like 40 minutes long or something like that this is something that we will most likely not use but I'm mostly interested in Clips directory those are around 150 videos that are 30 seconds long perfect for our use case to have the access to the data set you need to have a kaggle account but after logging you should be able to either download the videos manual early or use kaggle API to do it using python the only thing that you will need is your kaggle username and access key that can be generated in your account so right now we can enter those in our jupyter notebook those are of course secrets so you shouldn't share them with anybody looks like everything is set up properly we have all required environment variables so we should be able to just list files in our data set notice that I'm only printing the top 10 rows everything works as expected so now we can pursue to actually downloading those videos I created a small Bash one liner that do it all for us so I list files in the clips directory I take first 20 of them and then I fetch them one by one into the jupyter notebook and after they are downloaded I unzip them now it would be cool to run yellow V5 inference on one of those videos to get a little bit of feeling where we are but before we can do that we need to actually set up YOLO V5 environment so we run that cell over here that clones the project and install all require dependencies enough after a few seconds we are ready to go and we can execute the inference on our first video thanks to the magic of Cinema we can see the results right away and I was actually really disappointed initially I was hoping that I would be able to use model pre-trained on Coco Delta set but I noticed few things that I really don't like first of all the ball that is the primary object that I'm interested is only visible on few frames the model has real problems with picking it up on a regular basis and those few frames is not enough for me to reliably track it and calculate the proximity to other players on top of that we are picking a lot of extra detections and it would be really hard for us to distinguish between cultures referees players and other random people that are simply close to the pitch sure I can try to extract the whole green part of the image and try to filter detection based on that condition but that proved to be highly being accurate because of different shades of green it was at this moment that he knew he up and that was the moment when I knew that I need to train my own custom model I cried for a little bit and after that I created a python script that used pre-trained Coco model to automatically annotate my images I uploaded them into roboflow and started refinement and after a few hundred images and four hours later I was finally able to train my model the actual training took like another five hours of my life I cried a bit more but long story short here we are I have the weights we can use them Booyah by the way if you would like to save 10 hour of your life and use my already existing data set it is accessible at roboflow Universe the link will be in the description and I'm super excited to see what you guys would be able to build with it now we can go back to Google collab download the model that I trained and use it to infer on our video the new results look much more promising we see direct comparison the old model on left side the new model on right side the ball is being picked much more often and on top of that we don't have any extra detections on the sidelines as a bonus I added two extra classes referee and goalkeeper and although the referee have trained pretty well the goalkeeper most likely because of class imbalance has not really trained I didn't have more time to retrain the model so in the later stage of the project I just decided to treat every goalkeeper as the player Simple Solutions are always the best ones up until now we've been using detection py script from yellow V5 repository but right now we need to migrate towards something that would allow us to mix detection and tracking in the meantime we also create few utilities to read frames from video and to display them into the notebook and now we can use them to extract the first frame from our video and display it on the screen and later on use torch app model load our own custom weights into it and run inference on that particular frame looks like we are ready for the next step which is installing by tracker into our development environment by tracker is actually a pretty simple one it uses intersection over Union to track objects still it's almost so down papers with code so I decided to use it unfortunately like many projects in computer vision Community it's packaged rather poorly so we need to jump through few burning Hoops to actually install it the first thing that we do is obviously clone the project and install or require dependencies and after that is done we need to assist append it into our python path yikes Pi tracker doesn't actually append Tracker ID to existing bounding boxes but rather predicts where the bounding box would be on the next frame so we need to use small hack to match those two groups of bounding boxes the one that we got from detector and the other ones that buy tracker proposes we will use IOU to do that that's why we need one additional library to do it efficiently detector and tracker are installed so now we can start to play around I created a bunch of very useful and very long utilities that are here to help me out with annotation so you see that I built like small abstractions over the opencv methods to allow me for ease of use and now we can test them in action by picking a first frame from our video and drawing those small circles around players and referees and ball at that point I was still using goalkeeper as a separate class but that will change in just a few minutes the next stage is actually to figure out which player is currently in ball position and to do it I specified a ball proximity parameter set to 30 pixels and whenever the ball is inside player bounding box or 30 pixels from it the player is marked as currently in control this is a pretty simple heuristic but works pretty well on the first video frame we unfortunately don't have any player in control but you can notice that they added one more marker directly over the ball on the full video we can clearly see that different players are being marked as in control as the ball moves between them unfortunately the ball detection is still not perfect so from time to time we lose both trackers to regain them after just a split of a second that especially happens when the player occludes the ball and finally the moment have come we can combine all of those small code pieces into a single run time so we will be looping over the frames running the model dividing detections into different classes figuring out which player is currently in control and tracking most of the classes so goalkeeper player and referee and at the very end annotate the frame and save it to the video the end I'm super happy with the final result I was aiming for something along those lines from the very beginning although I slightly underestimated the effort originally I was actually going to Cluster players into the teams so right now you see that all players are marked as green but I plan to divide them into the teams based on the average color of their shirts I managed to do it on the twitch stream but I didn't really have time to put that into that particular notebook maybe that's an idea for a future video in the meantime make sure to check out roboflow notebooks repository that tutorial is actually part of that repo too there is a ton of useful resources over there for anybody who is interested in computer vision if you like the video make sure to like And subscribe and stay tuned for more computer vision content coming to this channel soon my name is Peter bye finally
Info
Channel: Roboflow
Views: 36,666
Rating: undefined out of 5
Keywords:
Id: QCG8QMhga9k
Channel Id: undefined
Length: 11min 37sec (697 seconds)
Published: Wed Dec 07 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.