Introduction to tracking data in football.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so welcome back to the mathematical modeling of football course today we are going to focus on tracking data and it's quite exciting coming to this stage because one of the original reasons we set up the the youtube channel um that all the videos are hosted on was that we wanted to talk about how you can use tracking data in football so tracking data is relatively new to tracking the coordinates i mean when we talk about tracking what we mean is we're tracking the coordinates of the players on the pitch so where they well where they are how fast they're moving where the ball is um whether we're also going to look at calculating if they get to the ball first and so on so the tracking data itself is x y coordinates of the players on the pitch now you can contrast that with event data event data is what we've looked at by position of what happens to the ball so there we only have the position of the player who's got the ball and the trajectory or not really the trajectory but the the movement of the ball via passes and that's what you've been doing your project on and that's what we've been analyzing so far but of course that misses a massive amount of information because there are actually 22 players on a football pitch most of the time and they're all moving around and they're all affecting the game that's kind of the whole idea here that this is a collective sport it's not just down to one individual so we're only in some way we're only getting with the event data one twenty second of the information that we need in unders to understand a football match in fact we get a little event data gives us a lit well quite a lot more than that because it is true that the most important things happen with the ball but really to get any context or understanding we need to have tracking data and that's what we're going to we're going to focus on and we're going to talk about today so my plan for today is first of all i'm just going to tell you a bit about tracking data where it comes from what you the basic things you can do with it and really i'm just going to go over the things that are also in lori's video which on the web page you can go down and and look at and then i'm going to talk about pitch control which is one of the first ways we can [Music] measure how a team is controlling space and then i'm going to look at passing models and then i'm just going to briefly mention a few things to do with formations and also i think something to do with fitness data as well which will come up in a lecture that subs will give later in the course right so and then oh and then the last part of the course i'm just going to go through we've got some very nice new tracking data available hama b and signality have very kindly agreed to provide tracking data from three of their games from last season so i'll go through how to use that data later uh in the course after after i've given the summary right so let's get going then summary structure for today preview of video material so what i mean by that is there's a lot of video material on the web page as i said when we set up the friends of tracking channel what we really wanted to do was give information about how you analyze this stuff and so what i've done is i've organized the friends of tracking channel material on the course web page so you can go through it step by step and gradually learn all the things that you need to know so i'm going to start by looking at measuring player movements which is in a series of lectures given by lori then i'm going to look at pitch control um and then finally i'm going to look at i'm actually going to look at formations and fitness data i've written here expected possession value i'm going to go into that at a later lecture so i'm going to talk a little bit about formations and fitness data i'll have a break between 11 and 11 50 and please ask questions in the chat section and i'll answer them as as i go along i'll take a reasonably slow pace i always say before i set off at speed in a random direction um oh yes and then the last part of the lecture will be signality data and just going through the code that i've put up on the um on the the github if you you're one one thing if you do want to already download this code i've put the password i've sent it to the course participants inside the course but i've also put it into the slack channel for those who aren't registered inside the course i can't share it publicly in that way but i've uh put the password there so anybody who is determined enough can can do it so you can already download the signality data if you want to get started with that okay so on thursday at four o'clock um pascal bauer is going to come here uh come on um we're going to have a lecture by him a live lecture this won't be available on youtube until he's cleared it with the dfb the um the german football federation who he works with um so it will be a live lecture at four o'clock i'll send out the link for that and he's going to go through where different types of data comes from in football and i already mentioned we've talked about event data that's what we've been looking at so far when we talk about stats bomb data or why scalp data this is um the data we've been looking at i haven't actually talked so much about how it's acquired it's manually so somebody sits there in the stadium often two people if they're going to do it live and they note down every event that occurs in the football match you can do a live or you can also do it on video afterwards but the type of data we're going to talk about today is optimal optical data or tracking data it's the 2d positions of 22 players on the pitch um often at nowadays it's often at 25 hertz and it's in theory it's the 3d positions of the ball often you only get the 2d positions of the ball that happens in the um in a signality data that we're going to look at there is another form of data which suds who will give a later lecture in the course will go a little bit into and that's gps data so gps you can have a local gps system which has a very high accuracy and also um you can put it on various parts of feet you can put it on your you can put on your feet and you can put it on your body and so on so there you can really accurately track things we're not actually going to go so much into this type of data um in our analysis for a couple of reasons um is very seldom publicly available because in a sense it's personal information because it also often includes heart rate and things like that and also it tends only to have one team so if you're working with a club you only have your your own team's position and that isn't necessarily very useful in analyzing a football match if you only know where your own players are so the tracking data for many of the the tactical things that we'd like to do is much more um much more useful okay so in the first video and again to say all i'm going to do here is summarize the contents of the videos which are available on the web page in the first video lori goes through tracking data and this is an example of data from metrica where they've released two matches and here we well now we see the coordinates of the players so there's the all all 22 players tracked their coordinates red team versus the blue team um we've gone with the idea that we're shooting left to right now we have to represent both if we're going to have tracking data we have to represent both the teams i think often in attacking situations yeah often actually often people will show it in the same direction as it's shown on the tv because ultimately you want to be able to overlay this type of thing or you want to be able to explain it with reference to video so when you display the data it makes sense to have the teams attacking in the same way as you have from tv footage there can be complications there for example that the tracking data so the tv footage that might be from one side the tactical camera might be on the other side but often they're on the same side of the pitch and so you should display the data left to right in the same direction left to right as if you're watching on tv that makes it so much easier to understand what you've done later if you're comparing it or overlaying it on tv footage okay so the ball is is here it's that little black dot that's represented here and and that's basically it we've got the positions of all of the and the ball in the metrica data set there's actually a coupled um event data so the passes of the ball are determined by are shown by the the event and so um yeah so i'll come back to that so laurie has also added vectors here uh calculating the movement of the players and the way he does that as he illustrates in the in the video it looks at the difference where they've actually moved over a number of time frames there's all sorts of little tricks there for how much you should smooth the data in order to get reliable measurements of these types of things and lori goes into some of the details of that it's very much a balancing art to try and get these these things right um it's yeah it's it's difficult to get exactly accurate measurements we're going to come back to several times that there's quite a lot of errors in these in these um in this video in the so there isn't there's lots of errors in the data and so but you can quite reliably measure the direction of the players and this is normally represented by adding a vector arrow pointing in the direction that they're moving for over one or two frames maybe three or four frames into the future and so the length of the vector represents the end of the movement and the direction obviously represents the direction they're moving in this particular situation the blue team have just won the ball and everybody is moving very quickly back even this type of information is extremely valuable because you can actually see how fast is number six moving how fast is number nine moving and how fast is number five moving when this counter attack has started so um even already there there's useful information for coaches and players who runs back fastest ten and eight are running back slower in this situation and as i said lori's added the pass direction here now in this particular data set there is an event data set coupled to the to the tracking data and he uses the event data to find out the position of the ball here's a little video of this he shows how to make a a video it's very useful if you make a video i think that's yeah that's the situation we see i i start to i often think i don't know that i prefer to watch football in this way but i find i have so much more understanding of what's going on when i see football in this way than i do when i watch it on the tv because you can actually see all of the players um you can see their movements you can see how fast they're traveling you can see and we're going to show this in a little bit you can see the space moving um opening up for them and of course is the what you don't see is the body positioning so how well they're they're orientated with respect to the ball if they're receiving a pass so that's a that's a disadvantage everything to do with a tactical view of the game you can see much more clearly from this top down view with the arrows and you actually see even even if you overlap even if you do have a very high camera some grounds do have a high camera with this sort of this tactical camera you call it those that tactical camera doesn't really give you the same impression as actually seeing it accurately top down so before i move on to pitch control i'll just i'll give you a chance to pose any questions you have about the um that type of data i'll just show so that um rodrigo is as usual very he's in charge of the um the slack page and he's uh very able to help you if you need to get the slack slack page to get hold of the data okay everyone seems happy with um with where we are just now no questions a lot of you will be familiar with this this these ideas with tracking data good i'll move on then so pitch control is one of the main tools that one of the main tools that we use to is that we currently use a lot of for tracking data you see this more and more in the public domain i haven't seen it so much on television yet but certainly clubs are starting to use it and it's a tool that we use a lot in my work a hammer b we use it a lot and the players love to look at pitch control things uh um and i'll just tell you um sort of there's a kind of history of pitch control um and i'll start building up from that this was something that this is one of the goals which inspired me most when i was writing soccer matics um and this is a messy goal obviously and of course messi in this clip he beats uh put basically the entire team and goes through them and scores but what is very interesting if you look closely at this clip is that well messi isn't alone in this situation you also have um jave iniesta and pedro all surrounding him and two of those players give a pass to him and so how does that work how do they manage to and this was when it was called tiki taka football you don't really hear that expression so much anymore but it was tiki taka football of like moving the ball around tick tick tick tick tick tack moving the ball around between the players and passing very rapidly in order to come directly through the opposition and that's exactly what messi was doing in that case of course it's a brilliant run by him but he doesn't um beat all of those players he beats them by passing the ball and getting the ball directly back to his feet again and what i noticed uh when i started to look at these types of tiki taka football videos is that you can break this down with a very old mathematical concept called the voronoi diagram and there's two parts of these diagrams these are the two situations the first one is the pass by chape the second one is the pass by pedro and what this voronoi diagram does is it breaks down all of the area which is closest to chaffe closest to messi and closest in the ester so if i take this zone here all of the points within this zone are the points that are closest to chavez all of the points within this zone are the points which are closest to messi and everything in this zone is the points that are closer to inester and so on now that's that to be very specific here that's within the barcelona team you're going to see if you look for voronoi diagrams in football you'll see various variations of this sometimes it's which point is closest for both teams and we don't see that in pitch control as well but this specifically i was interested in which point is closest for each of the barcelona players and the reason this is interesting is if you look at the two defending players here they're right on the line between jaffe and in jave and messi and what that means is that they are maximally distant from both of them so when the pass is taken from messi to jave they are maximally distant to these two defending players and if they're maximally distant you're going to have a greater chance of this pass succeeding the same thing is true over here when you look at messi and pedro it's a much smaller area but pedro is spread out so that the opposition players are on the edge of the voronoi diagram so they're maximally distant both between messi and pedro and also javi so this is by no means a perfect rule for football but this seems to be from a mathematical perspective what barcelona are doing in a wide variety of situations and what i think is really beautiful about this is one thing to say you know here's the voronoi diagram and this this looks very nice but what it means is so the voronoi diagram has what we call a dual graph and that dual graph is just lines if you just draw a line from javi to messi perpendicular to the dotted line of the zone and directly between them this produces this delointy triangulation which is the jaw graph of the voronoi diagram and so there's a line here between messi and jave there's a line here between iniesta and jave there's a line here between petro and chaffee and what's lovely here is if you maximize this space as they've done here you sort of get for free a triangulation which allows you to pass between all of the different players so maximizing one criteria i.e space for yourself also maximizes another criteria or at least to some degree optimizes another criteria which is these passing triangles between all of the players and that's very nice because you don't need to explain voronoi diagrams and deloitte triangulations to players all you need to do as a player is to think about how you are moving into space how you are managing to create as much space as possible for yourself and then the triangular passing alternatives will come for free and so that's why i think the reason i want to go into so much depth in this is is that's why pitch control is interesting it's not just about who's going to get to the ball first but if i'm going to go into the details of exactly what later but that's why this type of approach of maximizing space is interesting because it also starts to maximize your passing alternatives without you necessarily saying explicitly you're trying to maximize passing alternatives moving into space maximizes your passing alternatives as well in a sense when we're studying collective behavior we like to think about this as a kind of emergent effect that you a simple rule you move base produces an emergent pattern which is very good passing alternatives it's not necessarily that you moved into space to produce these but it's a sort of emergent side effect of these of these this type of movement okay so that's the the theoretical background and i like to tell this story because i um i i wrote this in soccer matics and i by no mean wants to claim that i'm the first person ever to think about using voronoi and football it's been used in other situations as well but i wrote this in sochatics and then i went down to barcelona um to see messi play and also to visit um la masaya which is the research institute in in barcelona and there i found out i met this guy uh javier fernandez who's also in the friends of tracking um webpage and it turned out that he had just recently published a lovely paper called wide open spaces and what javier does is he models not the voronoi diagram but he models so the limitation of the voronoi diagram is it doesn't take into account the speed of the players what javier did was account for the speed of the players in how he he produced these types of maps of who get to the ball first so in this case the green areas are where barcelona play who are in the barcelona color here where they would get to the ball first the red areas are where the opposition team the yellow team would get to the ball first the ball in this case is the white dot down here and so you can see that past there if it was possible to make is probably going to get to a barcelona player a pass back here is definitely going to get to a barcelona player a pass here for example even a pass there is rel but now you can actually see who controls what space and so how does he do this well what he does is he essentially um him and his his colleague luke born what they essentially do is that they look a sort of radial player influence for each of the players and that influence depends not only on the position of the players like my original voronoi diagram did but it also for the speed of the players so if the player is standing still then his or her area of influence is a circle around them in fact it's a gaussian distributed variable they use they use a normal distribution to describe the area of influence and they also use a normal distribution for looking at a running player so this is a player influence for a player 15 meters away from the ball running at 6.36 meters per second in a 45 degree angle and what they do for every one of these they calculate again they have a normal distribution for the they take it along the axis that the player is running so there's a normal distribution going this way and a normal distribution going this way which describes how quickly they'll get to the ball i mentioned this normal distribution a couple of times because we're going to come to the way that laurie does it in his lectures the way he does it is he actually tries to calculate from a physics based point of view if you're running at this speed and you accelerate in this way how quickly could you come to the ball javier's way of doing it is slightly different he just you he calculates a normal distribution based around these parameters of the direction that you're going and so on so you can read more about that uh in his paper but this is actually used um inside barcelona and it's used increasingly more in other clubs when they're trying to analyze the positioning and how much space their team is occupying if we step back to laurie's um lecture he asked the question what are the options available to the player on the ball and he answers that question using a pitch control model of himself so in in this case he says that the blue areas are the areas where a blue player will get to first the red areas are areas where a red player will get to first you can see that his his areas are a little bit sharper in their definition and this is because it has a different underlying model and this is important to to realize that there isn't like one unique pitch control model every one of them is based on some assumptions javier's assumptions are very nice and simple there's a gaussian distribution a normal distribution which is related to the speed that and the position of the player which determines where they are in laurie's lecture he tries to build it from a more physics-based point of view he says for a given location on the pitch how long would it take for the board to arrive there how long would it take for each player to arrive and so what's the total probability that each team will control the ball after it's arrived and he repeats that calculation for all locations on the pitch again you have to watch laurie's video to go to find out the details but this is an estimate of the arrival time just given a fixed speed of the ball traveling very simple assumption and then he calculates the player arrival times um again he makes an assumption there's a reaction time of so i think he's 0.7 seconds you continue running in the same direction then you just accelerate towards the ball and you for that way you can calculate the arrival time of this for example player 4 and this player 24 and he in this calculation he's as on average it'll take two seconds for player 4 to get there and 1.5 seconds for player 24 to get that and from that you can calculate the probability that a particular player will win the ball so it's not um definite and that would be because of some error in the reaction time for example and some error in the running speed and so on so those errors will mean that there's never definite that one player will will get there first in these types of situations uh you really have a probability which you work out and um lori goes into the details and provides the code for how you how you work out these types of things and that gives you the answer each of these points for every point on the pitch you just calculate the probability that a particular player will arrive first okay are there any questions about pitch control to pass probability good so i'm a bit yeah i'm a bit behind but i've got some good questions here about the um from the first part about the um about the event data and the tracking data uh patrick pershing asks has a has there been any work done on trying to detect um events has beginning work done on trying to detect events from raw camera footage yes lots of people are trying that um well a lot of companies are doing it it's really really difficult um it's one of those ai problems that basically hasn't been solved for various reasons i think it's one of the reasons is it's very difficult to track the trajectory of the ball um even if you've got very high resolution data so that's that's one of them but another one is it's it's just difficult with things like interceptions tackles those types of things to get sufficient training data where you can really really do this kind of thing um there there's commit yet how accurate is the tracking data it's um okay that yeah i mean it's on an order of half a meter maybe something like that will be the accuracy of it i don't have exactly um the accuracy of it it has errors in it but that's not that isn't the big problem the yeah maybe it's about sort of 25 centimeters half a meter is the accuracy of any given player that isn't the biggest problem the problem the biggest problem is identifying the wrong players so you'll find this in the signality data that's available is sometimes the players will just switch um so it thinks it's tracking one player but actually it's tracking another player and that's because they often use jersey numbers to track the players and it doesn't it doesn't quite work um mikas is the share of pitch control pitch under control depending on the velocity of the player yes it is in um again it in both of the models both javier's model and in lori's model yes it does in javier's model it depends on it in the sense that the the gaussian is stretched depending on the velocity of the player in lori's model it depends mostly if you're running in the wrong direction to start with then you when you you then you'll be going away from the ball and it will take you longer to change direction and accelerate towards the ball lori's model is a very simple idea you're just running in the direction you you were running before then you changed your velocity when you realize the ball is going to particular point and then you accelerate towards that point um luka says with this positional data on the voronoi diagram can we predict the past receiver or the coordinate um i'm not exactly sure what um what you mean there i mean because the past gets to the player so that you can receive it um but but also the i mean yes you can that's i think that's the idea of the model the idea of the model is that given where the password went you should be able to predict who gets the ball first there's very little validation of the model against data it's very difficult to validate it rigorously against data so instead you make simple assumptions about how the players um will move um could random forest be a way to get the same result with more cost-effective model i think there are lots of things and maybe somebody can share this but since we released or since um metrica released the tracking data there are some really efficient implementations of pitch control out there so um have a have a look for them really efficient and good implementations of pitch control um have been made available good okay so let's move on and we will go to past probability okay so um if you re if you watch william spearman's lecture video on the friends of tracking web page and also that i've made available in the um on the course web page you'll find that he worries a lot about the details of pitch control william spearman is now he now works for liverpool doing working with tracking data for them and before he worked for huddle and he developed these tools in order to um to do this type of pitch control you present presented at several conferences and so on um what he is worried about of course is the details of these things so it's one thing to say that a ball will come to a player first but there's other things for example when the ball is in the air how do we calculate the pitch control then so what happens when the ball is in motion um and so he does that by actually calculating the time of flight of the ball and working out whether it can be intercepted by the player because often there's like more than one arrival point the player that the ball can get to so if you in fact i think i have a diagram oh no i took out the diagram sorry so he actually has in his in his paper he has a very nice diagram where he shows a trajectory of the ball and then he shows several players trying to get to the ball and sometimes that one player will miss the ball and then it will be the opportunity for the other other player and so those types of past probability things need to be taken into account um um we together fran peralta and i or or fran did most of the work here but but he re-implemented spearman model and changed it a little bit in order that we could calculate what we call past probability and so past probability you can say maybe it's maybe it is a pitch control model but it's a sort of more detailed pitch control model where you try to model the actual dynamics of the ball and so you have a equation for ball motion what fran did is he separated it into two parts this is equation for a ball in flight this is the equation for a ball moving uh um against friction on the ground and he just made a very simple assumption here that two thirds of a bull ball trajectory is in flight one third of it at the end it's on the ground a really naive and simple assumption but you need to make these types of assumptions in order to get out models and to try and find if things work then he would calculate the ball trajectories these would come out as these blue dots here and he would calculate which player would get to would get first to those points and so he would have a like lorry an equation of motion for the player which is basically forces is the acceleration and there's a damping term here it solve all of these and then he would work out who would get to the ball first this isn't something that we're going to cover in detail or that i expect you to implement in the in the course but it's worth being aware of and this is why this is why these top clubs are recruiting physicists it's worth being aware that in order to create these types of models a physics background is very useful to be able to describe the equations of motions of things and then you take and you also always combine both the both the deterministic model of motion with some sort of this is actually like a logistic function well it is a logistic function you have some probability of interception which is predicted which gives a sort of smoother probability curve based on the original data so again you can get more details of this model um from this paper and also from looking at will spearman's video and some of some of his papers give you an example of this this is one that we actually work with um in hammer b this is a a pass that happened early last season i think if i just pause the video it was just before the past you actually have you see here that um alex kashnich at this point he has two past choices either he can make the pass to the left winger who's running in in here or the striker is coming on the right here and so both of those pass opportunities are available to him and what we can actually do with the model is we can calculate the probability of both of them working out so yeah here's the exact situation the exact moment that he makes a pass decision two different players he's left footed so this pass is a realistic choice this pass is also a realistic choice and we can calculate the probability that the blue line here is the trajectory of the actual pass and the green um area here the yeah the blue is a trajectory of the actual pass you can see all the positions of the players green means high pass probability red means low pass probability and so his actual pass selection was quite a low probability and even if you've managed to get into this area you've got three defenders defending against one attacking player so you could actually show alex that this was the better choice in this particular situation it was a much greater chance that the forward who was moving rapidly would intercept the pass instead of the past choice that that he actually jumped and this becomes a very useful tool actually to talk directly to players about their decision making in different situations and it also sorts out a lot of arguments after the match where certain players say well you should have passed me you can look at these diagrams and you can have a more neutral discussion about the past probabilities i think i'll i'll just yeah i've got five more minutes so um showing an analysis of one of hammerbee's goals from last season this isn't included in the tracking data but it's um there's other similar goals like this so this was one of my favorite goals so hammer b are the green team in this and um against the blue team and you see that we get the ball back here outside the box um counter attack it's actually the player we've got the ball back who runs up now and he scores a goal that's moyo tankovic so we can actually analyze this using all the steps we can analyze this using both pitch control and pass probability we start with pitch control this is pitch control of the whole situation i'm just going to stop it just as the opposition pass goes into the box so if you look when the opposition pass goes into the box we actually have a lot of control of the area inside the box and we've got a lot of control of our penalty area as well so this is a relatively safe situation for us and given our positioning is not surprising that we regain the ball and the ball you can think of this i i like to think of pitch controllers like who'll get to the ball first if it just drops down at random and in this case it's one of our players because we have a lot of control it's moyo tankovic he does the first pass you also see i want to emphasize here that um our this is our number 10 um attacking attacking midfielder does a very good run here to open up space on the left dragon number 16 with him number 23 our forward does a nice run there on the right backing out tonight and we have actually in this situation we have a lot of control over both attacking wings which makes it very hard for the defending players um ball goes out to the left and then comes back in and so we can break this down using pitch control because we can see this this was the moment that i stopped the video we have a lot of control over the penalty area so when the bull goes in we are very likely to get it back as long as it comes out in the right right direction we also control and open up space during the attacking run and then defending situations about by pitch control but then when it's attacking situations past probability tends to be a better way of analyzing it and you'll see here that here we have a very high pass probability would also be high pitch control very high pass probability out here on the right a lower pass probability there in the middle and it's interesting here because alex number 20 is also the player who we saw in the other clip this time he makes the 98 certain pass out to the wing this is a reasonable there's some reasonable chance that he can hit that pass back there 60 and it's into a more dangerous area so there is some argument about maybe that is a good choice of pass but this is the the more certain pass and then once nico number 40 has the ball uh you can see that he's got two alternatives in here to 23 who's a striker and 22 tankovich who got the ball back originally and we can work out 65 percent there but now you've got a sort of forty percent expected goal fifty percent expected goal if you get the ball there slightly lower expected goal if the ball goes out there so this is a very good choice of pass which he makes and we score a goal good i will um stop that and what we'll do now is we will have a break you can put in um questions into the chat during the break and i'll answer them when we start up again and i'll finish off telling you a couple more applications of tracking data then we will go into looking through the code if you haven't done it already now is a good time to download the code so that you can look at it in python at the same time as i am talking about it so um do download do take the take the chance to download download the code okay well have a break and i will be back at 11 15. thank you very much okay i'm back again so great um great uh feedback here and questions on the um on the chat but in particular diego asked um in this example you're not considering some important facts that in my opinion affect the past a lot example the body orientation can you include them in the model using tracking data now the answer to that is is simple we don't have the body orientation on of the player you can for example use the running direction of the player which we actually do already have in the in the um the data but i think one statistic i've heard is that 40 percent of the time players are not orientated in the same direction as they're running if you think about defensive movements backwards and forwards you're looking forward and you're moving backwards and forwards so um it's very difficult to include that type of data so when you see these types of models there's two things to bear in mind we're following yeah first of all is the two things one thing is the limitations on the data so it's just what we when we build a model we just take into account of what we've got which is the velocities of the player with the positions and the velocities of the players are not always 100 accurate the ball trajectories tend not to be accurate from the tracking data they tend to be better if you couple them together with event data but they're still not perfect and that also accounts for several questions have been asked about shouldn't we account for the fact that the ball is in the air or on the ground well yeah we should but we don't always know that from the tracking day so still we're still sort of in the early days of using tracking data so that's one of the reasons for these simplifications the other reason it's important to point out that when we make simplifications in models like this it's usually also so that we can understand and use the models so we use the kind of as simple as possible but no simpler this thing that's attributed to uh albert einstein or there's also a quote from george box um which is again should use you should use the simple oh all models are wrong but you want some models are useful so you want to make useful models which are simple and informative um there's always going to be problems with them so those those are the two reasons for the simplifications one is the just the lack of data the second one is that you always make simplifications when you're making these types of models um yeah i'm going to there's questions about pitch control and performance and so on i'll come back to some of some of that later um but again yeah we're still in there again the question is like my measure of performance yes pitch control can be used to measure performance but with again we're still in the early stages of that it can be that a player standing in a lot of space is not doing anything useful right so if you have a player who goes and stands down at the corner flag he or she will have a lot of control over a certain area of the pitch but that area of the pitch won't be useful and it will be playing attacking players on side or something like that so you can't just use pitch control as it is it's part of a framework of tools for understanding and understanding the pitch control understanding the um the game great okay so let's move on i wanted to do this partly answers the question which just came up or starts to answer the question that just came up about the value of this thing just to give you an idea where we're going with this what i've discussed today are two different models of who will get to the ball first and what is a successful pass one is the past probability model um green is an area where it's a poor pass red is an area sorry green is the area where your team will get their first red is the area where the other team will get there first and then pitch control which is basically if the ball just drops down random you see that there's a very big correspondence between these two um models that they match up in a lot of areas but we're also going to later and this relates more back to the expected goals models and some of the value models as we move on in the course in week six and in week eight of the course we're going to look at value models for actions so if you think about expected goals as a value model for shots and week six and week eight we're going to look first at value models for event data and week 8 we're going to look at value models for tracking data so that moves more into the into the answer to the question that came up just now about the value of pitch control if you do you can actually find in this particular example this is if you have the ball here what's the probability of you your team scoring if you pass the ball here if you pass here or if you pass the ball here well this is a heat map the bigger the heat the greater the probability of scoring so the best pass possible is the one right to the goal to the goal mouth but clearly if you pass the goal mouth um you might get stopped by the keeper because often the keeper's there so it's the point wise combination of these two plots which tell you the overall value or give you an estimate of the overall value and that's called the expected possession value so this is going to be later in the course but we're going to start in week six by working out past impact what the values of certain actions are other than shops which we've already seen and then in week 8 we're going to discuss ways of combining past impact together with past probability to give an overall value of what is a good thing to do in football and what is a bad thing to do in football so by the end of the course we should have gone some suggestion of of that type of thing so just to prepare you that's where we're going a couple more things that i thought i'd bring up one of them we're not going to spend so much time in in the course on but is quite interesting and a lot of research is done um formations so this was one of the first um examples of a formation um work that i saw and i wrote about it in in sochomatix it was done by alina bilikowski and she and her colleagues at disney what they did is they they basically looked at team formations compared over a season so the symbols indicate the average position of the players during one half of every match during a season and you can see there's some teams here playing very reliable four four two strikers moving backwards and forwards here is a sort of four four two situation but with a lot of more movement around here down here it's a much more this is probably due to a kind of more pressing play but it's a um four now there's two strikers up here so it's four three now maybe um um i don't know maybe it's four yeah i think this is a four two three one so four two three i seem to have an extra player here so and some movement between those two players so you can see they're changing their formation and um then there's also another other formation here this is a five at the back formation and so you can actually identify just by plotting the average position of the players you can get reasonably clear patterns of the formations used um used by the teams since elena's work this has developed somewhat and um this is a nice paper i've forgotten to give the credit there to laurie but uh lori shaw has been working a lot on this formation things this is one part of his work where he looks at defensive this is defensive formations in four instances of time during the match and the ball is in different places the ball is here ball is here ball is here ball is here and how is the defense structured in those different situations and you have a sort of 4-3-3 defending formation by this particular team where they have seven players behind the ball um three players maybe in front of the ball so this is given the opposition what's their most likely formation and this becomes very useful because you can start to get where you can get a much if you can actually look where the ball position is you get a much clearer idea of where the formation is what the formation is than just writing out the average positions because they can depend a lot on the on the opposition so this gives a nice clear idea of what the defensive formation of the team is this is something that we do in hammer bay and what we do is we break it down into three parts of the pitch um actually i wanted to take this one because i think this one's a bit cleaner this is defensive formations for kalmar um playing against uh an attacking miele so calamari defending in this thing this is when myelby have the ball in this area of the pitch this is how bar are organized so they have three uh forwards defending here two in the middle and then five at the back and you can see gradually as as meow me move the ball up how does kalmar's formation change and here they also have um here they also have four sorry three two five or one defensive five two three here and again and it becomes more compact and they move back more more central five two three so very clear formation and it's important there i think that was why i had the first slide is that there's different formations teams have for defense and attack so this is their attacking formation of kalmar when they're attacking in this direction and you can see actually that three the wing back here is coming out very wide comes back into the middle in the center but he's coming out very wide and there is no corresponding wing back on that side so this gives useful information about their build up play when they're attacking their one win back wing back that goes out wide another which tends to stay central or at least they did in that particular match and so i think that's one of the important things with formation is that it's context dependent um if you just do the formation over the entire match you don't get as clear a picture as if you break it down into what what the teams do different thirds and what the teams um whether they're they've got the ball or they haven't got before another thing we're going to look at in week 7 of the course we're going to have a lecture by suds gold dashikan i got nervous there because he's always joking about how nobody can pronounce his name but suds is going to give a talk about um loads so there's one thing to tell football players that they should run around a lot there is some constraints on how fast sprints uh how many sprints players can do in the matches and how fast they can run and um suds is going to tell you more about this he sent me this wonderful picture of different cars which he told me um he told me that this what did he tell me that he told me that the first slide shows how different players have different efficiencies uh when converting their heart rate load uh into locomotion so different cars have different deficiencies as well and um different players have different deficiencies so he's going to talk a bit about that um in his talk and link that to tracking data and also he's going to emphasize a lot of what we don't we haven't really touched on too much in the in the course is how your role as a data scientist comes together with the sports scientists now there's a lot of analysis of fitness data heart rate how many runs that you can do in a match we don't really look at that so much in this particular course but suds is going to give us a brief rundown of of that stuff and how it conveys with a data scientist and also how you work together with a video analyst we've we've looked a little bit more about that in the in the course so he's going to cover some of that stuff okay and finally i just want to say before i go into the the course web page and the the data limitation with tracking data i've already mentioned this and for the question does not account for body position there are is more studies and i i think pascal is going to talk about this a little bit on thursday there's more um studies where you do have body position and there's a study barcelona who work a lot in basketball have a have started working on this sometimes error prone the biggest error is not so much in the position of the players it's that you get id switches between the players whether the technology which is looking for shirt numbers makes a mistake and switches to a um a different player ball is usually tracked in 2d so there's some strange trajectories come up there um you often need to link to event data is another limitation um which is done in the metrica data but not done in the signality data and there is a lack of public data so we're working on this course we're working with the metrica data the signality data we're also going to look at some publicly available data from last row who is well he posts on twitter tracking of champions league goals and liverpool goals for example so you can have a look at that we're going to use that but generally there's a lack of public data to work on with this thing and i think that's something to emphasize here that we're really at um planning this course i really want to sort of put you at the cutting edge of what's going on in this field and we still haven't really got everything together in terms of very reliable data which we can use for all sorts of situations so you're still sort of working on the edge of of the state of art data collection and it still isn't perfect for every type of application that we we could envisage great i am going to stop the share there are there any questions um before i move on to the the web pages and the um modeling the course i think i've answered most of the questions up to this stage i will start if you want to you can put in some questions while i'm getting this up good so the place we're at in the uh course just now we're on 21st of september what i've done is i've put up both of the um i've put up both of the um both of the next two weeks lectures because there won't be a live lecture next week um but i put up both of the lectures so you can start to work through this material yourself we actually have already on the friends of tracking web page very good video material stepping through this both with lori and with lori shaw and william spearman who works for liverpool now have made really nice videos stepping through all the stages we need here so i thought i'd just look into those pages and just talk you through what you should be watching um laurie sure starts in the first video he's getting started with the tracking data um look in the metrics sports data all the code is available there in the github so please download that and start working through that then we're here on this on this lecture i've just given this lecture looking at tracking data in football then you can download the tracking data yourself the code is here in the github i updated it this morning because i found a small problem with my coordinates and so on but um this is the code first for downloading the games um you'll notice here this is an important thing the password username and password are not filled in you can get them either from the course web page as an announcement or you can get them through the slack channel and this allows you to access the signality api and basically download the um download the data you don't really need to understand any of this code it just downloads the data and puts it in a folder on your your computer if you've got the username and the password then i'll go through in a bit some of the basic code for plotting you should download before you get started you should download the whole github directory in order to run the code because it's structured in the way way that it yes it's structured in the way one thing i've done here is i've rather cheekily put up videos of the matches as dropbox links hopefully these work but you can actually download and watch the videos of the matches this is very useful if you want to actually know what's going on especially if there's some errors in the tracking data you can square that with the video of the matches which is downloaded here um you can yeah i'm not going to start that download bad idea oh right so it is available to play i think it's possible and that this will be all sorts of problems here but it's possible the time stamps aren't available on those videos unfortunately they just don't make the time stamps available um but the videos the matches are there you'll just have to work out the time stamps yourself from the from the start starting whistle and so on um uh i actually have event data for these matches but if you want to look at a particular match you can go into the 12 rating system here and for example if we look at hammer b it was hammer b home against malmo for example was one of the matches you can look at the actions for the players here so if you're interested in looking for all the shots for example you can look at the shots here richard maguire scored just with one shot there so that tells you the goals are 87 minutes um this is the second goal yeah just one shot and scored a shot by roddich there another shot there so this this can tell you actually when the the different events occurred so you have to do that manually so there's definitely like a little bit of playing around manually with this um with this data set uh and um manipulating it to use it um so i'm afraid you'll have to do a little bit of that it would be nice if we had if we had event data for those particular matches but we currently can't we don't have publicly available event data for the matches but we do have videos for comparison so that's an advantage over the metrica one where we don't have the videos but we do have event data then the exercise um we'll discuss this tutorial on thursday you should have a there's three parts to this i'll go i'll go into more detail of this at the um i'll go into more details of this in the tutorial but basically the idea is just to get an idea of of measuring the speed um and direction of the different players and so you're basically going to be re-implementing the thing that lorries of lori has done on metrica data but doing it on the signality data and then i'd like to i'd like you once you've done the basic things of of analyzing the um yeah so i'll just say this so write a program which plots distance from goal speed and acceleration of all the hammer b players is a short time series i plot these variables over time the analyst is aware that there's limitations and errors in the data provide with a visualization of the three variables for a few different examples of a 15 second sequence of play so this is basically the idea is you get an idea of how this data works and so present some examples of this and then talk to her about what the accuracy of the data is and what you can expect from it i think it's good for you to actually start working with the data and getting the feeling for what you can and you can't do it and then i want you to do this i want you to analyze um i'd like you to look at the opposite distance to the nearest opposition player for each of the hammer hamabu players during their possession and analyze certain situations so instead of analyzing the whole match as i mentioned a few times take certain situations and look at these metrics and then finally implement your own metrics that you think will help us understand how hammerbee scores goals and use this to analyze your chosen harmony player again same um same thing as last time two-page document containing non-technical parts then runnable code and the same thing as last time also that the points out of 10 for the um for the exercise great that's what i want to say about what's up on the web page oh you know i wanted to say one more thing because i wanted to say that also if we go to pitch control please feel free if you want to it's week five of the course but feel free to dig into this information this is william spearman's um overview of pitch control a lot of the articles that he cites during it then there's lori shaw's description of how you actually build the model and then i've got a video there um professionally produced one looking at applications of pitch control one thing that we've some of the ways that we've used it at hammer b for example great okay let me just have a little quick look at the questions again um yeah again there's a question about um straight passing and curved passing so uh i realize that past probabilities is um almost a past representation that it's a straight line doesn't mean that it's rare for players to do curved passing no it's not at all rare for players to do curved passing it's just that against limitations of our models which mean that we haven't included any curve passing anyone who's interested in working on these types of problems they're really problems at the cutting edge of tracking data so lots and lots of interesting open um problems the username and the password is in slack it's in information i think um yeah i think it's in the update channel yes it's in the update channel so go into the update channel subscribe to that and you'll find the the thing um our modeling for tracking data difficult for um for newcomers in yes i mean i suppose this is the the this is the hardest part and it does require more of a physics-based background and a more a stronger mathematical background um good question yes can we upload the hands uh rodrigo has a more practical question there please do share your hand-ins after the deadline um tomorrow and then you can give feedbacks inside the slack channel and even the people who are on the course properly feel free to use the slack channel to share your work afterwards um after the deadline i'm very keen on giving each other feedback and talking to each other but i would like that to happen after the deadline so that everyone has a chance to work on work on this themselves rather than copy someone else great okay so finally the last thing i want to do is have a quick look in the code this is code that was partly written by me but mainly by fran peralta who works at hammerbee and it's how we actually work with i've got anaconda okay so i'm going to assume that you can so first of all i mentioned that there's this there's the make this smaller it's sometimes easier for you to see so first of all i mentioned that first thing you need to do is download the data there's three matches and hanabi or so download those those matches first username and password in the slack channel then the second piece of code then loads in a particular match and visualizes it oh yeah i have to download it again because i took it away when i whoops what can that be i need to put a username and password in oh uh okay i'm gonna stop share for a minute and i'm going to put the username and password and then download it again sorry about this um i've had to take that in and out so many times so you go into the the slack channel it's kind of ridiculous here that i'm not putting this into the video as well good so you go into the slack channel and now i'm re-downloading the data again um get the password put it in there and then download the download the data it takes a few minutes to do this but you really you don't really need to understand any of this code in 12 get signality api it just gets it and puts everything into a data frame which it then saves which you will then use in your analysis and then once you've done that well then this code i wanted to give you a few things to help you plot the data and get going and so again this is code that fran provided me mainly with it loads up it pre-processes the files if required then it loads up if it's already pre-processed it just loads up the pre-processed file and then it allows you to plot various situations so you give it the home team name hammerbee give it the away team name evaspori first thing it does is it um gets the yeah there's a few things here it's going to do so it's going to get the players in play everything's in inside there's a library thing here so this function get players in play inside the libraries it finds out which players are on the pitch handling various errors around that so it finds all the players that are on the pitch which is going to be very useful because the the subs are also listed as players so the ones that are on pitch that's what players in play does then it finds the um transversal coordinates the reason we have this is that signality often change their coordinates to various systems in this case it actually just changes zero one coordinate up to a um we can have a look here transform coordinates takes a zero one coordinate up to a hundred and five sixty-eight pitch coordinate which we're going to use for plotting i'll run this so we're just loading in the data then the main thing that's done here is that we plot different situations and again this is inside the function pre-process thing here we have a function which creates the pitch and plots the pitch on a 105 by 68 um such a 100 568 coordinates then the plot situation basically just loops through the players one at a time we've said here that we just want to have the um the numbers on the team where the home team that we're interested in so the green dots are the hammer b players the yellow dots are the opposition players elspori and the dotted line the small dotted line here is this ball position and we just have the ball position over 10 frames and the big one is where the ball is actually at that particular frame but you can actually see that so to give an idea of the trajectory of the ball one of the first things you're asked to do in the exercise is to find the direction of the players and to put arrows on them so the first thing you should do when you start is start to put arrows onto the directions of the players so you know where they're where they're moving to uh and that's basically the code that you're provided with functions in there to do the loading these things up and to plot a situation and your task is to start measuring the speed the direction direct distance from goal of all of the players and making making plots of them basically following what lori does for the metrica data but also for the signality data great are there any remaining questions i'll also try and take some practical questions now if you've got any in the in the group chat in the in the chat i'll have a look at them and i think that is everything that i wanted to say for today so let's have a look in the in the chat yeah robert everett asks is it too late for me to start the course in week one i mean we're in before it's pretty intense course um you can always you can always start the course and join the slack channel rodrigo helps everybody out with the password to the or it helps everyone now to register in the slack channel um yeah there's a good question here so somebody says my team is wellington phoenix and the australian ailey um right so yes i i've seen that question a few times in the slack channel as well what do you do if you're working for a club that doesn't have tracking data that doesn't have event data what sorts of things can you do well gps data is one of the things that you can do with respect to tracking so you can actually look at the speed of the players and start to go down the sports science route is not going to be very useful for tactical things because you don't know where the opposition is but you can look at things for example whether you're whether the team is holding formation in various situations so you can start to plot how the how you're organized then definitely the fitness data you can analyze and then the other thing for example we work with the hamad b women's team where we don't have much data we look at things like entries into opposition's side of the pitch entries into the final third entries into the box um and then shots so we actually look to see progressively how well the team are doing between those different tasks so get from getting the ball over the line and so we just do that by hand we look at we note every time the ball goes into the opponent's half we can also look at regaining the ball so we can collect event data on those types of things so my answer is that you find out what what you really you find out what the coach really wants to measure and you use handcrafted event data to try and answer those those types of questions it also comes up i think one thing that i have not really talked about but it is quite important here is the many clubs don't use the event data that we talk about they don't use the opt for event data they actually have their own custom-made event data for example running into space so you can have off-the-ball event data and so that isn't publicly available because each club collects it separately um and that if that's what you're interested in collect like how many times and this often happens with professional scouts as they just note at which times did the player that their interest run into and create space so the particular metrics that they're interested in and you can do a lot of the analysis we do on those types of metrics as well um there's a good tip there i don't know about this bird's pie view if you have a lot of video footage and time on your hands oh yeah you can do it you mean the bird's pie view i take it lovrey that he's mentioned here is that you can actually do hand tracking yeah there's a couple of there's a couple of tools which i've seen that are available where you can do your own tracking by noting the position of the box and where the players are it then projects it down onto the pitch and i did a lot of that when i was first working for sock-o-matics um i because i didn't have any of the data i'd actually trace by hand the positions of these things so yeah you can certainly do a lot a lot of stuff by hand the thing to the key really is to doing things by hand is that you work out before you start what it is you want to measure and then you don't spend a lot of time measuring things you're not interested in so uh decide what you're interested in measuring and measure that and you'll be successful you can do the same thing and now i'm just babbling you can do the same thing with passing networks um i i train under well i don't train them anymore they're under 15s um i train them in futsal but i don't train them but i used to make passing networks for them when they were um like nine or ten and they'd love to see in particular if two players were just passing each other all the time so that could be a good thing if they were really working well together but it could also be a bad thing that it was just those two players playing each other and to show it to them visually all you need to do is before the game just write up the seven players it was at that time on a diagram in the formation you're playing and you draw a line every time they're passing each other and you actually have some very nice passing data um the tool that yeah so there's a hmm the tool about the tap the one thing i've seen i remember that um ben i've forgotten his second name now um ah um let me try and remember that afterwards this is if anyone else knows this there's a tool by ben as this name has gone out my head i'm afraid um i'll put in a link afterwards where you can do that type of thing and also the guy who does last row he is also making and maybe that is the bird's bird's pie view he has started to make lots of tools for doing this this type of thing and that's how he collects the data you basically need to know where the penalty area is you put because that stays still and then you um put the other thing so have a look at the last row webpage on twitter or it's twitter and find out the stuff that he's been putting off which is really really nice stuff for using broadcast data that's maybe another thing i i should mention is that the data that we i've we've discussed so far no it's not ben everyone's called ben um oh he used to work for this um analytics fc they were called there and now i've forgotten um sorry i got interrupted i was saying yes so the data that we've looked at today that's taken from the tactical cameras at the ground so it's the club data but there's more and more ai work using tracking data um using tracking data from broadcast tv footage and this is a really big growth air growth area that clubs are very interested in because they only have the tracking data for their own league to play but if they want to scout another player using tracking data then they need it for other leagues and so there's several companies who'd using broadcast data from matches in order to um in order to track the players and provide club with tracking data from every league and there's one company called skills corner and there's another called sport logic who have really mastered that art and i think that will be a bigger and bigger thing just because tracking data isn't available and i think more and more tracking data will become available because those companies are working with with broadcast data yes it's re ben torvani so ben torven if you if you search for um um bent overnight's web page i love all the bends that are coming coming in here the correct answer of the ben question today is ben torvene um who i've met several times really nice guy did some of the best work starting up in football analytics and um you should look at his webpage he has a lot of free tools available and one of the ones he has is a or a clickable map for tracking particular actions um there's other things here people have made apps there's there's lots of apps for those types of things i think at one club in sweden they made an app mainly to like calm their parents down because when you've got parents watching 11 years year olds playing football um i i i think i've subscribed to this you shouldn't be shouting all the time at 11 year olds who are trying to play a game of football match telling them things so what the coaches did is that they uh gave the parents an app where they could click where their uh who was passing to who and it sort of kept the parents busy and stopped harassing the children on the football pitch so i think that's a very good it's a very good tip um give the parents some of the work to do and it's it's the same thing that we um we do it at hanabi as we employ some of the fans or employ but we give the fans an opportunity to contribute by collecting data and provided you have um dedicated enough fans who don't mind sacrificing a bit of seeing the match or a bit of the joy of seeing the match then you can collect that type of data very reliably great okay now i'm just um blabbering a lot of random stuff at the end of this lecture this has been an introduction to tracking data there's not going to be a lecture of this type next week there is lots of material for you to start going through online i'll make a chat group so that we can you can ask questions about the next project i've moved the deadline a little bit forward so you've got three weeks to complete the tracking data um project and send that in and i'll meet the students in the course on um at the at the tutorial times and i'll see the rest of you in the slack group great thanks a lot bye-bye
Info
Channel: Friends of Tracking
Views: 6,293
Rating: undefined out of 5
Keywords:
Id: fYqEnoOV9Po
Channel Id: undefined
Length: 103min 20sec (6200 seconds)
Published: Tue Sep 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.