Randomness and prediction of matches

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello welcome to the uh the third lecture in our series on mathematical modelling of football so today we're going to look at simulating matches and in particular we're going to look at what i think is you can say is like the most important mathematical idea behind football and that's the poisson distribution and we've looked last week we looked a lot at expected goals and that's the thing you maybe read about the most as a sort of statistics applied to football but i think what we're going to cover today the poisson distribution is sort of more fundamental if you like to think of the idea of mathematics underlying things i think the poisson distribution is the basic thing that underlies football and you might not be familiar with the quest on distribution i'm definitely going to explain a lot about what it is today but what it's related to is the memorylessness of football often when people talk about football they they talk about sort of the the build-up of the game how the game progressed over time and so on and the sort of feelings and things that happened which which sort of led to the the picture of the match that's valuable but when you actually look at football from a mathematical perspective perspective you find that it really doesn't have that much memory the goals can occur at any time during a football match and they pretty much come at random they come at random at different rates for different teams which are better and worse than others but they pretty much come at random throughout the match there's a few kind of caveats about that but that becomes our basic mathematical principle whenever we're we're modeling anything to do with football it's the principle that we'll use today for simulating teams and matches but it's also a principle that we'll use later when it comes to simulating match situations the whole idea is that you can't see too far into the future in a football match and that might seem that it makes it sort of random and difficult to predict but actually the fact that we can't see too far in into the future makes it more amenable to mathematics rather than less amenable to mathematics okay i've said it started off with a few abstract things i'm going to make it more concrete as we go on about how we're going to use the python distribution i'll get my screen shared here and i think i need to click that yeah so what's the structure today well we're going to look at randomness in football i've already said a little bit about why football's random this is a hard it's a hard concept to it's a difficult thing for me to say that football is random um i often say that when i did a lot of interviews around soccer matics people ask me how random is football and and i i can't i think is a rough rule of thumb which you can relate to two-thirds of random of football is randomness if you think that there's not typically about three goals in a football match one goal goes to the better team and the two goals are basically randomly shared between the um the better team who might win three nil or the other team that might win 2-1 so that's a good way of roughly imagining for a premier league match for example how the outcome might be might be described and the one goal is non-random the two goals are random so there's about two-thirds randoms i know that in the numbers game they came up with a half of a football match as random and there's various studies showing showing that kind of thing i don't think you can put an exact number on it but let's be sure there's lots and lots of randomness in the outcome of football matches not necessarily in what the players are doing and that randomness that leads us to the poisson model which is a great description of when a small number of random events occur and there's definitely we know there's a small number of goals in football and then once we have the press on model we're going to then use that to try and find signals for good football quite a few questions came up in the in the first particularly in the first lecture about you know i was creating different heat maps and different descriptions of for example press defensive actions of passing and so on but what i didn't try in those lectures was to say well this thing's good and this thing's bad and that's because it's really really difficult to say what's good football and what's bad football but if there's one way we're going to tease it apart the poisson model is the way that we're going to do it then i'm going to do a bit of a code walk through i've put up i haven't put the link in the youtube but if you go into the course web page i've put up the link to the new code on the github which you can definitely sit there and run in parallel to when i'm waffling on um and i've also put the lecture notes for today's lectures i put a couple more things in since i put them up but most of most of the lecture notes for today's lectures are up on the course web page um if anyone's wondering where that is maybe someone who knows it can share it in the chat i check that we can have access so i think you should be able to get into both those things code on github the um lecture notes if you want to have them in parallel maybe skip ahead and if you're getting bored of me talking let's go ahead and see what's coming next in the slides you can do that as well i'll have a break 11 to 11 15. please ask questions in the chat where possible i will try and remember to break off and answer them on the way i always say this i'll take a reasonably slow pace then i get carried away and start taking a higher pace but i'll try and take a reasonably slow pace and interact with you along the way okay so when are shots taken in a football match this is one figure now i've actually forgotten which league it was i think um is it the same as that one yeah it's the same as that one so it's the premier league this is the second half of the premier league and this is when um what i've done is i've taken stats from i've taken the yscap data and i've basically put it into boxes from minute zero up to minute 48 in premier league matches for the second half and i've measured how many shots occur within those minutes and the main takeaway message here is if you look at this line this is the average number of shots throughout the whole match i've i've cheated a little bit here i haven't actually measured the length of all the matches i've assumed that matches are on average 48 minutes long if you really want to do this properly you might want to to make an adjustment for that but this is the average number of shots per minute through um oh sorry the total number of shots per minute that occurred in all of those matches the average of course needs to be divided by the number of matches and what you see is just sort of for the main part you just see a kind of fluctuation around this average it isn't that there's one minute where all of the action happens maybe you can say that in the first minute of the match in fact you can say in the first first minute of the second half there's less likely to be shots second minute maybe but then after that really up to the final whistle there's equally likely to be shots in in a football match now i think that this is quite surprising um when you haven't seen this statistic before because we do tend to think and i even think when i'm i'm watching a match that well there's going to be some sort of exciting chance i have this friend who always talks about like the last big chance and when we watch football together he's sitting there waiting anticipating what he believes is the last big chance that every team always has and he always says that this will happen in the last three or four minutes then when he's happy if it's his team who's um who's losing and they need to have this last big chance when he imagines that this big chance has occurred maybe in the 46 minute after that he gives up all hope but when you actually look at the stats there really doesn't exist this last big chance there is throughout the match a relatively even number of um of shots apart from maybe in the in the first minute then you can also look at that for the first half in the second half i was actually quite surprised because i've done this before and i haven't found this relationship but i found that there does seem to be a slight difference between the first half and the second half there are less shots in the first half than there are in the second half and that might be that's i think i think due to two reasons one is that the matches start slowly so in exactly this area here we have um i'm going to put this laser pointer on in this area here we have less shots so when the match starts the teams come out a bit um a bit slower and then i think there tends to be more extra time in the second half than in the first half so that's another part of the explanation but the difference is reasonably small in the first half there's around about the same shots as there is in the second half i think if you were a coach and you were trying to exploit something about this figure this is what you would think to this is what you should be thinking to exploit because here it seems that there is some chance of getting more shots up there is no reason apart from the ball starts in the middle of the pitch but it does that for a lot a lot of the game there's no reason that you can't start to have shots certainly by the second minute in the game um so but the overall thing i want to emphasize is that you have this you could take the first and second half separately but you have this reasonably um it's a normal distribution of the number of shots that you have per minute and it's pretty constant throughout the match and then when we look at goals we actually find something similar there are more goals in the first in the in this particular data set there were more goals in the second half than they were in the first half but during they come pretty much any time during a particular half there's of course fluctuations here and the fluctuations are bigger because the number of events is smaller but there isn't really a clear pattern where there's an increase in number of goals coming as the match goes on and neither is in the first half there's an increase in the number of goals again maybe the first or second minute there's some pattern but otherwise for goals can come pretty much any time during a football match again i want to emphasize that if you're possibly if you're working in betting and you want to find a small small edge to exploit about this pattern then there might be something for you to find there but for nearly all other purposes of modeling football it's the best assumption is to say that we um the goals come at any time during football match because that's just going to make everything to do with our modeling so much more um so much easier as a premier league season i did i did some i did some um other things with it i did the bundesliga and la liga and got similar patterns um to this one the codes in there we'll have a little look at it later okay so i like to make this figure because i think it it brings home um the point so if you divide the match into let's say a hundred boxes this is a hundred minute match a bit longer than we might expect but imagine you have a hundred minute match and you say that one box is equal to one minute what you're essentially doing is you're saying that goals occur more or less randomly during a match and typically a team scores 1.35 goals just less than one and a half goals per match and you can think about what you think about this as a model is you basically think about just throwing these balls these random goals down at random inside the boxes and sometimes you throw um sometimes you throw for a team sometimes you throw one ball sometimes you throw two balls sometimes you throw three balls but you'll basically throw these balls down randomly in the time slots a bit like the time slots i had in the histogram so you might have two goals coming in into the match at random points you might have one goal that you've dropped in at random might have two goals over here and that's the basic idea of our model we're dropping goals randomly into boxes and then we can actually from there we can actually build a mathematical description of football i was asked for a documentary i was asked to do this i was asked to like look at what are the chances of buying conceding two goals in injury time in 1999 in this in this famous match against manchester united um is i took a couple of statements from the commentators i rewatched this match and looked and listened to a couple of statements from the uh commentators i really like this one i tell you what if they could equalize and i'm not betting against them i think they'll go on to win it which i think actually goes against exactly what i've just said you don't expect if a team does get that random goal there's no reason actually to believe more strongly that they'll get the next random goal other than that there might be a slightly stronger team and of course it ends with nobody will ever win the european cup final more dramatically than this because manchester united scored two goals in injury time um i'm going to say that they were with within one minute they were maybe within one minute of of playing time but they were actually within two minutes of each other but for the sake of argument was say that they were within one box of my uh model and so basically you can see the match like this so this champions league final you can see the match as sort of a load of empty things and then suddenly pop pop two goals just occurred in the same box in the same minute of this match and of course this is very unlikely that they occur both in this in this um way but also we watch a lot of football matches we watch a lot of football matches where different things happen and now and again things like this will happen so the fact that you get these two goals at the end of a champions league final it doesn't completely dismiss the idea that the goals occur at random it just happened that in that particular match it was both of the goals happened in this 99th minute again i think i need to emphasize this over and over again because it's a sort of fundamental thing about mathematical modeling this is our assumption it's not that i definitely believe that uh manchester united only won the champions league final because of randomness it's just that it becomes a reasonable assumption and very little that we observe in football contradicts that assumption so for most cases and even the champions league final doesn't necessarily contradict that assumption you can of course go in the other extreme i did this because i thought it was fun the both goals occur in the same minute or within a minute of playing time of each other but the probability of that is one in a hundred um because i've got 100 boxes that they they occur and if they both occur at the same time there's a one in 100 chance of that and then they both occur in the second last minute well that's a one in a 10 000 chance because you then specify specifically which minute you want to have you can actually go a little bit further and you can say well what if they both they actually both came from a corner and then you start to get down to like this is a one in a million chance so this is a kind of opposite way of seeing it than i've just described or we see it as random we actually see what happened and then we calculate the probability of that particular thing happening i think i want to emphasize this is a just for fun calculation using probability this which we've looked at here is the more serious model of how randomness can fool us in a way or or how we we can control randomness we see football matches as balls falling into boxes and then we can actually manage to build up a model of those things so yeah one in a million that they chose that if you'd asked me before the match um to calculate the probability that manchester united would win with two corners within a minute or two of each other i would have probably come up with one in a million but given the amount of the football that's played and the different things that can happen all of those um these these different things happen and i suppose it's the same with the champions league with um the liverpool people come back from three goals down liverpool get three goals in the second half a very unlikely event but consistent with the idea that goals occur randomly at in football so often when we have these very strong stories about what happened in a football match they often don't lie so far away from the randomness that we expect to occur in football matches fun calculation yeah i forgot i put i put that on there to uh it's a calculation based on if you asked me before the match but of course after the match um this thing did happen so it's no longer a one in million chance it actually happens the chance was 100 but there is serious math behind it oh yes that um we have this we we break things down into boxes and we we're going to put these these balls down and here is the serious math zen so what's the probability of k goals um occurring in a football match now before i was looking at one team and i said that was 1.35 goals this this number 2.7 some leagues it's 2.5 some leagues it's 2.8 uh it varies a bit but i think in one premier league that i was looking at there was 2.7 goals scored in the match and i'm going to assign them randomly between 95 minutes which might be typical for a football match i haven't looked at neither of these numbers are absolutely cast in iron but they're roughly what we see in typical football matches okay so what is the probability of goals well the way we work it out and it's related to the box model we drop um we see in each of these boxes what's the probability that it's filled with a ball so the probability that a particular box is filled with a ball is 2.7 divided by 95. so in this particular case i've got 95 boxes not the 100 i have here and i say that each of them can be filled with a ball with 2.7 and this is this probability so out of here we're going to fill k of the boxes with a ball and that's 2.7 divided by 95 is the probability that any one box is filled with a ball and the probability that k boxes is filled with a ball is where you just multiply all of those together but we also have to account for the boxes which aren't filled with balls and that's one minus 2.7 divided by 95. so this is the probability that a particular box is empty you see the probability of boxes empty is bigger of course than the probability it's it's occupied and this one minus 2.7 over 95 well that's going to be some reasonably large number this number this number is about 1 in 1 in 30 something around this so this is um 29 over 30. and that's to the power of 95 minus k because 95 minutes and then we take minus k as the ones that are filled with the box then the last term we have today this is the combinatorial so this is um this is 95 factorial divided by k factorial times 95 minus k factorial i should have written this out but these are these are this is the factorial um this what do you call it this is the choose function 95 choose k which just to say it again 95 factorial divided by k factorial times 95 minus k factorial and that then says well we have to choose each of these so these when i've dropped all of these these balls into the boxes i have to actually choose these things at random and this is the binomial distribution which is very typical or is used to describe exactly this type of problem you'll have seen it used for example um [Music] yeah well you'll see i'm going to try another example this this is you this is used for this type of problem of dropping dropping balls in randomly okay so i've i've i've had 95 i had 100 boxes here and i had 95 boxes here i decided on here's the question so why have i divided the match into minutes what's special about minutes i could like use two minutes i could use 30 seconds i could use one second i could use 10 seconds why do i make my boxes minutes and there is actually no good reason to make my boxes minutes other than it's something that we understand very well um but we can actually think a little bit more about this question and the question really here is and i'm not going to give you a complete answer here is how many seconds into the future can we predict a future a football match can we predict it one minute into the future can we predict it 10 seconds into the future can we predict it one second into the future and that's basically the question of can you predict um when a goal will occur in this case because it's not just predicting anything about the football match is predicting that a goal will occur within one minute time frame fame into the future now if you can if you happen to believe that you can predict a football match more than one minute into the future you can actually become extremely rich because you've got time to place a bet on who's going to next score and you'll become very very rich because you'll be able to predict this if you've got some sort of method of doing it and that method is better than the bookmakers method which is based on the poisson model i'm going to talk about then you can get extremely rich 10 seconds into the future possible in some situations it's possible when there's a penalty for example then you can make better predictions that there'll be a goal um in the near future if a penalty's just been awarded but outside the being a penalty it's pretty difficult actually to predict even 10 seconds into the future one second into the future i can kind of believe that um it's possible to predict what's going to happen in a match you've got a one-on-one situation you can certainly get a much better than random estimate of if there's going to be a goal or not but basically this number here uh determines the number of boxes so it should be somewhere i think between one minute and one second so um to get the number of boxes you divide 90 or 95 or 100 or how many minutes you've assumed in your football match you divide it by um the the time scale that you believe you can predict into the future and i think that that's somewhere between one second and one minute and that tells us well um yeah so i've said here 2.7 goal score per match we're randomly distributed by between 95 minutes but then we can rewrite this equation we rewrite this equation by saying well whatever boxes you come up with here if you think it's 10 seconds boxes then it's 95 multiplied by six boxes that you should have if you think that you really can't do it even on one second then it's 95 times 60 boxes that you have but we have these n boxes where 2.7 goals are scored per match randomly distributed between the n boxes again we use the choose function here and choose k this is the probability that a goal occurs in a particular map box this is a probability that gold doesn't occur in a particular box and multiply these two together in the binomial distribution and then we can do quite a nice thing so we there's a nice mathematical result that tells us this this is sort of complicated to calculate this binomial distribution there's from a mathematical point it's kind of unsatisfying that we have this massive n factorial thing here and there's several reasons we don't want to use it because just because lots of mathematical machinery is not built up for dealing with these types of discrete discrete distributions but we can do the following thing i'm going to go through this step by step and you can have a look at this later i've stolen this from a mathcal tech lecture um saved myself writing this out myself it is a very nice derivation of what's called poisson's limit or it's also called this law of small numbers which goes from the poisson from the binomial distribution here this is exactly what we saw on the last page and it goes to the poisson distribution which is shown here and what we do is we take the we take and start with the um binomial distribution this is where we expand it out in terms of the um this is what i should have done to make it clear this is n factorial divided by n minus k factorial gives n multiplied all the way down to n minus k plus 1 then we divide by k factorial here we have the mean the mean mu here which is 2.7 in our case n is our number of boxes but again this is just the same form so this is just a simplification here then what we can do is we can take this n to the power of k and we can divide through all of these terms by n we've got a k factorial on the bottom we've got mu to the power of k and we've got this thing over here one minus mu of n that's just to the power of n minus k that's the same thing as we had before um simplify this thing again or not necessarily simplify it but turn it into a into something that we'll be able to deal with when we take a limit so we're going to take a limit as we get n smaller and smaller um sorry n bigger and bigger because n bigger and bigger corresponds to more and more boxes so more and more randomness in football and we do this one minus one over n because what what's going to be really nice here is that as the n goes to infinity this will disappear n goes to infinity this will disappear n goes to infinity this will disappear and we've got this k factorial under well we've also we've left this because we also know what's going to about this because we also know what's going to happen when n goes to infinity and then we do this we put n goes to infinity in all of these so we're allowing n to go to infinity that's the number of boxes going to infinity this corresponds to less and less predictableness in football now of course isn't exactly true because i admitted that in on a one second interval we can start to actually make predictions in football but it's nearly true to the the n you've got to you get to for example 60 times 95 is pretty big so in practice this thing is going to work when we let n go to infinity all of these terms become one this term becomes one this time becomes one this ten becomes one i have a one here k factorial is not affected by n going to infinity mu to the k is not infected by n going to infinity this thing here well you've got mu divided by n n goes to infinity so this thing just um goes to one and then this last part here this is probably the um the harder harder one to work through but actually this is by definition the limit of this is e to the minus mu as you have 1 minus mu divided by n to the power of n so you get e to the minus mu and so finally you're left with poisson's distribution which gives you the yeah gives you the probability that gives you this probability of scoring in terms of your original parameter mu 2.7 k and you'll see that n has disappeared which is very nice so we don't need to think about our boxes anymore provided we can write our football game in terms of enough boxes we get the the limit we want which is a lovely result discovered by poisson many many years ago ago and is also known as the the law of small numbers which i think is a lovely thing and then the small numbers is is the small number of goals in in football there is a small number of goals in football and that allows us to make this a derivation of the price on distribution and what does this mean in practice well i told you this equation um for for 95 minutes then i looked at this equation for n and in practice this means that this equation the number of goals in your match will be described very well by this equation 2.7 to the power of k e to the minus 2.7 divided by k factorial will give you the probability you'll have a particular number of goals in the match this this is one this is really a kind of um this is a bit geeky actually or what can i say about this i actually thought about this a lot that 2.71 is actually very close to the number e so e is approximated by 2.71 so the fact that you have 2.7 goals in football means you can actually write this equation down which will work very nicely for a lot of goals in football you see that this is actually a parameter free equation because e isn't a proper parameter in maths e is the universal constant so you can write down the number of goals in football just using universal constants and k which is the thing that you want to estimate is the number of goals so e to the k to e to the minus e divided by k factorial no parameters whatsoever will give you a fundamental model of football i think that this this is i am joking a bit here but this is sort of as fundamental as um gravitational models or quantum theory models this is the fundamental equation of football where you don't have to um you don't have any constants whatsoever in describing the number of goals that occur in a football match but i do want to be clear that that is also because of the coincidence that it happens that 2.7 goals on average are scored during football matches i think maybe it's not i don't know so okay there's been a long derivation but this is what i think is just incredibly cool about this it means that we can get a whole goal distribution um recovered from the mean number of goals in the match so we have the mean number of goals in the match it's 2.7 or it's 2.5 or whatever and that allows us using this this equation in particular it allows us to then plot the entire curve of likely outcomes um in matches during the or the outcome of matches during the season so this is number of matches plotted here and this is number of goals the black curve here is the price on distribution which gives a reasonably good estimate of the number of goals scored in scored in different matches and so we can recover the whole goal distribution from the mean number of goals this is one i think actually i might included the wrong figure there so i'm just going to check that but this is one i did from the y scout data this is probability proportion of matches with a particular number of goals we've got the price on curve here you can do this yourself for the 10 goals per game got the poisson distribution there we've got the data here and it's not a perfect fit but it's very close to how many goals you get there's one match here which is overestimated one nil seem to be overestimated in la liga we look at bundesliga possibly over-represented one-on-ones here it's over-represented one and we'll see them predicted maybe over-represented one-on-ones in bundesliga for example more draws than we might expect you can see no more draws than we might expect less one nil results but pretty much consistent with the with the poisson model i think the last thing i want to say before i go over to because what i'm going to do next is like how we're going to use this one thing i want to do last here before we i go and have a look at your questions and we get into the break is that the poisson distribution is fundamental to football if you're interested in basketball um or a lot of other sports then it's actually the normal distribution which is the fundamental distribution and the reason is this is actually the opposite of what not the opposite of the law of large numbers but we have the law of small numbers for things that don't occur very much which is goals in football the things that occur a lot we have the law of large numbers and that's where the normal distribution comes in normal distribution is what you get if you add up loads and loads of random things so if you add up loads and loads of random scoring in um basketball matches you get a normal distribution and i made this actually one for my for my latest book i looked at basketball and found very nicely and regularly for nba games 2018-19 and i'm sure you can do this for other seasons you have the normal curve this bell-shaped curve very accurately describes the number of points scored per team in each game and that's because you can see basketball is the sum of lots and lots of random events while football is the sum of very few random events law of large numbers gives you the normal distribution law of small numbers gives you the poisson distribution good what i'll do um now is i will just stop this and i will go in if you've got any questions please please type them into the chat and i'll see if i can um answer them um good so rna says could argue that the ideal box size is the run time of an average single possession that's a good starting point i think i haven't gone into this but i did think of one thing to mention is what you would what you would do if you really wanted to find out what is the perfect box size for football you should use the liapanov exponent the lyapunov exponent you can look it up on on wikipedia i think i need to leopoon off exponent so to do this and this is a really lovely project for someone i've never seen this done before is that you take um you take possession chains um sequences of of um possessions and you use the lyapunov exponent to work out where the ball is in the future and if you can predict so the degree to which you can predict where the ball is in the future on some time scale gives you um this this constant it tells you the number of boxes um so that that's my answer to animes um l planet 2009 2008 does the data really suggest that goals are independent i guess scoring the first goal influences chances yeah i i haven't really gone into exactly that again if you're interested in investing there is um some way of of of uh doing that but surprisingly not that the really the best you've always got to out compete the base rate of scoring so if a team gets a one-nil lead against manchester city then the commentators will describe it as like manchester city are really laying on the pressure they're trying to try as hard as they can now blah blah blah football players always try as hard as they can or they nearly always try as hard as they can sometimes it works sometimes it doesn't manchester city's base rate of scoring as we're going to see soon is incredibly high so they're very likely to score so it can happen that these sorts of things happen psychologically but what's surprising actually is how little that type of psychological thing happens compared to the amount of time that people spend talking about it yeah i mean you know again tepho asks the same question um about you know is it nothing something non-statistical um so one thing that i always emphasize here is you have to be able to measure it right so what do you mean by when you say is there something mentality of the team's playing so where do you measure that mentality so we have a very good model of football which involves it being random it works for lots of things and so unless you work for the team and you find some way of measuring their mentality and even if you're working with a team what exactly are you measuring after the match they will say they they didn't have the right mentality the players will say oh we didn't have the right mentality because we lost and if they win they'll say yeah it was our mentality that brought us through so there's there might be a way of doing it but you always have to think as a data scientist what can i measure and what can't i measure a mentality is one of the things that's very very difficult to measure you can measure for example if particular players are managing to pass to each other more successfully so they're connecting well but then mentality maybe isn't the right word to use it um sarthak asks can i use the large numbers to be used to estimate the number of short and long passes in football yes that's a good application of the law of large numbers i haven't really gone into the detail but one of the exercises i've left in the github is to use the normal distribution to estimate the number of um the number of shots in the match number of passes is going to be really well modeled by the normal distribution um for okay so if tfo asked now is it following up for this uh the first ten would be um cup finals seven occurrences that lead yeah i i actually so it's quite interesting in the um uh in the i'm going to talk about the 538 model they do include that if it's the cup final um what how much is on the match um how much how important is this match to the teams is something that they put in so that that type of mentality you can definitely do we've done a project for example we did a project at hammonby looking for big match players and players who over perform and underperform when they're playing against better and worse teams so you can actually use that at the at the level of the players possibly to to do um could you comment on the dixon goals model versus the dixon coles model is poisson regression so um you're right i'm not actually i haven't got that as a reference there but um the dixon coles model is precisely that it's a it's a poisson based regression model so dixon college uses python to work i'll i it's great that you bring up the dixon cults because i don't explicitly mention that paper but it's one of the key papers using pass on regression um great okay so vincent lisa is a fundamental equation of football you provided is quite interesting yeah i think that's really fun um i don't know how seriously it should be taken because if the games were if the games were 45 minutes or something like that then it would be different great um i will i'm going to have a break now i'll put up the slide for the break and um i will be back very shortly okay see you in 15 minutes good so before the break we went into some depth about the pass-on distribution and how wonderful it is and how much it captures of football because of the inherent randomness in football what we're going to spend the rest of the lecture on today is is basically using that then to construct a statistical model which will allow us to test various hypotheses someone asked about the dixon calls model and that was brilliant question because i realized that i'd forgotten to give the original source of of where this was used which is the dixon calls model so i just put that in during the break so we'll get to that very quickly um then that was a normal distribution if we were interested in another sport or any other sport than football and i mean it's marginal when we come from hockey if we go from poisson distribution up to normal distribution if there's sufficiently number of goals but most sports have sufficiently number number of points in them that we use the normal distribution but poisson is what we use here okay and so what we can do well i'm going to come to how we do it um over time but one thing that we can do quite simply is before we we start like fitting the model let's just assume that you think that manchester city scored two goals on average and everton score one goal on average this is from something which i've done the link link is at the bottom i did this for a few years a few years ago and city were going to play everton and just to illustrate again the power of the poisson distribution if i assume that city scored two goals so the poisson parameter for city is two and the price on parameter for everton is one then i can actually simulate all different results just it's not even simulation it's just picking out from the equation we gave here so we put two in for city and one in for um everton putting those in we can actually pick out how many goals the two teams will score and we can make a quite easily just make a graph like this so this isn't based on even simulating anything in the sense that we run a computer simulation it's just based on sticking the parameter 2 into the poisson distribution for city and sticking the parameter 1 into the poisson distribution for everton and then making a list of the different results so five nil is quite unlikely because the average goals is two um while something like two nil is much more likely and see in fact two nil is the most likely score in this case even though um 2-1 was the average goals that we would expect from the two teams so this gives uh situations where city wins situations where everton win um where it's a draw in situations where everton wins all directly from the pass-on distribution but what i've done there is i've actually assumed in advance i know that city scored two goals on average or i expect that from the match in order to calculate probability what we'd like to do what we prefer to do is estimate that from data from previous matches again if we if we this one is based on on estimates of data from previous matches and this is something i did in soccer matics simulated i took the scoring rates of clubs during 2012-13 then i simulated what might happen in the 13-14 season on the basis of that and you can see it manchester city did win that season and you can see some of the simulations they they win um here we have liverpool second chelsea third this is just a random selection of the simulations or a sim or a selection of the simulations in order to illustrate that point we get these types of results out of it so we could also get based on the scoring rates from the previous season that liverpool might have won that season and these things give reasonably convincing models of what happens over an entire season but this is what i wanted to emphasize i'm going to come back to how we actually do these simulations in a bit but um what i want to emphasize beforehand these models aren't perfect they tend to a little bit they underestimate the best teams if you notice here city won by far more points than 73 in this in these recent seasons we've seen um city and liverpool on like 99 and 100 points and things like that that's not going to be predicted by a poisson model because it's always just a little bit too conservative so they're never perfect for the number of points um and that's partly the nature of the model this is small things that that don't quite work with the model plus it's partly that it's difficult to estimate the scoring rates for the teams so one thing we can do for example is we can use expected goals early in the season to simulate um future matches so these simulations they they kind of work they're not very far away but they're certainly not perfect either and that's evidenced by the fact that the best teams slightly over perform what the model would predict and also the worst teams tend to underperform what the model would predict they they sort of make things a little bit more even than we'd like them to be okay to summarize football's chaos makes it predictable um goal distribution can be found this is what i think is one of the coolest things that just with one parameter you can find the whole distribution of things and you can even predict um very reasonably the results of a match you can predict the results of a season using those types of things score line can be deducted draws tend to be a bit wrong um for different leagues that's what bookmakers spend a lot of time adjusting for you can simulate the league easily and what we're going to look at now is how you can find these as as indicators of good football this simple model of poisson goals is really difficult to beat it's very very difficult to beat and you don't see people beating this this model very often the bookmaker's odds slightly beat this model but really this simple model is difficult to beat okay so how do we estimate the parameters and that's the problem that was solved by dixon and kohl's which was mentioned in the chat beautiful paper 25 years old now incredible um where they said okay goals are pass on distributed seems to be the case what we need to do is have a model to estimate the parameters for this pass-on distribution so here they've written down the joint probability of the home team um getting x goals the awaiting getting y goals and the main part of it here is the poisson distribution here that you have um lambda x which is the lambda is the rate at which the home team scores mu is the rate at which the away team scores and you have two separate parameter values for both of those um both of those teams and the equivalently you can say as as we do down here get my laser pointer out again as we can say down here we can say that well the the random variable which describes the scoring rate of the home team against the away team is pass on distributed with three parameters one is an estimate of the quality of the home team's attack the the b to j is an estimate of the awaiting defense and this gamma parameter is the home advantage parameter and all of these things can be estimated using poisson regression on the data and that's what we do we do in the code we try and estimate these these separate parameters see in this case they're multiplicative i think normally you would use multiplicative parameters um so this thing is going to be like 1.1 for example if a home team is particularly good in attack b to j a good awaiting defense is going to be 0.9 so it reduces the probability and then the home advantage um is going to be some particular value so i think it's 1.2 or something like that multiplying the number of goals that you're likely to score and what we do in the code let me just go briefly i don't want to get stuck too much into um the code but i do want to give you a feeling for it um if we i need to share this screen just say overall in today's the the code that i've released today um i help you get going with this the first one does the shop times loads in the different things we looked at that generates the loads in the y scout data um generates the shots looks all the events for the german league for example finds all the shots tags them and sees when they occur um am i gonna wait for this yeah there we go so that generates the first plots that i showed you so you can play around with that it's also too interesting there was a question about passes for example you can do something similar looking at passes would be interesting to do that's what code 9 does code 10 in the github um it loads it does the python distribution so if i just put in all the code for that i should get out this was for the german league looks at the number of goals as a pass on plot you can change leagues up here and look at la liga and so on to see where you are with that that uses the y scout all of these user wisecap data now the the 11th one is is different um i actually wrote this code because i was kind of interested i was very interested when corona broke out could we recent could we simulate what would have happened in the premier league there was a lot of questions about for example um there's a lot of questions about like what was what was a fair decision um for who would stay up and who would go down if you had to simulate the league or if you had to award points for the thing and i think that basically the most fair way it's not going to be something that any football authority is going to say that we're going to do but the most fair way to decide the league is to use a pass-on model and simulate it into the future and the highest probability results for the teams that if a team has the highest probability of winning then that should be the one that wins the league in second and third and fourth should basically be decided on the highest probability team in that particular position and also for the ones that go down i don't think anyone's ever going to do that but that would be one way of reasonably doing it and so i actually um i looked at some more recent data and peter mckeever also used this data in his in his analysis there's this lovely data set collected by um at football data dot co dot co dot uk um collected by joseph um joseph book doll i think his second name is brilliantly collated data set which allows you to fit models or which tells you just the results of the matches which is all we really need for this um for this type of model so if i load up that data i get a to be honest when i first did this fitting the model i did it at the point when the season had been postponed or we didn't know what was going to happen about corona and so on but it gives a nice data frame full of the both the odds for the games and the number of goals that were scored and so on and so you've got all of that and so when i when i did it we actually didn't know what the rest of the season would look like and so it was a good point to try and simulate out the rest of the season now actually we have all of the data from last season loaded into this and we are going to use stats models again one of my favorites um packages for do it for doing statistical modeling it does the more sort of traditional statistical models the kind of things that dixon and cole would be doing load in gold model data makes a um makes this couple model data is ahead um actually just gives us the goals that liverpool scored against norwich for example west ham against mancity simple data set of this of this type then this is code i want to emphasize that this code i i got originally from a very nice blog about predicting models using dixon and cole you can go in and have a have a look at that but they basically use the poisson model in statistical models to fit the goals as a function of the home team the sorry home advantage this is a home one zero if um the team's at home the um the team the opponent and so on and then we're going to later so we can we can fit this model there we go and once we have that that model we can actually have the coefficients for every one of the the team so in this case a coefficient i'm going to go into this back in the lecture but coefficient positive coefficient indicates a good team a negative coefficient indicates a bad team arsenal because it's alphabetically first in the um in the list of teams that's actually been picked out as the benchmark team so what these tell you are is a team better or worse than arsenal so aston villa is statistically worse than arsenal bournemouth is statistically worse than arsenal chelsea are statistically better than arsenal manchester city are a lot better than arsenal so no surprises there we'd be surprised if it didn't come out with these types of things that's in terms of scoring goals and then in terms of conceding goals aston villa concedes statistically more goals than arsenal do and so on and then what we can do is once we've got those rates we can start to actually simulate matches and that's what the rest of the code does i've found that the scoring rates of manchester city is 2.6 arsenal against manchester city are expected to score 1.16 and this one i think it's nice just to run this a few times so this is just some matches well that was a manchester city arsenal that was meant to be the match that would be the first match after the uh the break and there you go i just ran the simulation came out 2-2 so um that's really that should be the result now um i'll do that again run it again manchester city lost oh now they won four nil just keep doing five one three one three one i'm surprised that i've got these results well there you go we've got the randomness in football you really get an idea for these things four two eight one okay that's incredible so um those you can run these things and you get different results but that's not really the most um it's not the most scientific way of predicting the results what you need to do is simulate it lots and lots of times and that's what the last part of the code does and it makes one of these very pretty figures which tells you the probability of different outcomes so the most likely outcome in all these simulations is three nil to manchester city then um you have one nil some of these results we got for arsenal were very unlikely based on this but that's the feature of randomness so this this gives you a matrix so this the probabilities it's like any of the heat maps that we've done earlier this is the probabilities of the different score lines occurring most of the things are clustered around here about 3 1 2 1 type of result is the most likely and then the less likely ones are further out here okay let me go back to the thing so you should definitely go in and have a play around with that code later problem that's exactly what we generated again i recommend thoroughly recommend this blog where they go through the details of how they how they did this fit based on the dixon coles model but what's important here is that teams are significantly better or worse than arsenal and this is one of the things i want to get to the um we'll get to the bottom of here because if the coefficient is greater than um if the coefficient is positive then the team is better than arsenal if the coefficient here is negative in this regression the team is worse than arsenal but we should also look at the probability values because when we do these statistics we actually want to find out our team statistically better or worse than arsenal now imagine from arsenal's perspective how they're thinking about these types of things so they want to be a top four team so they can accept maybe that um chelsea sorry they can accept maybe that manchester city and liverpool who have been very strong recently are um stronger than them but they are less likely to accept for example that tottenham are stronger than them or that um chelsea are stronger than them because they're competing for those places in the in the top four and this gives us some part of the answer there which teams are statistically overlapping seasons stronger than arsenal and you actually find that there's just this one should have been red but there's just two of them uh liverpool and manchester city they have p values which are lower than 0.05 is what you would usually use as a cut off so only liverpool and manchester city you could say a statistically over performing arsenal last season but on the other side we only have three teams which are statistically underperforming arsenal and those are crystal palace norwich and watford so all of the other teams are not indistinguishable but they're very similar to arsenal and this makes it very problematic for somebody who's sitting at arsenal deciding what they should do they're basically statistically no different than how many teams 14 other teams in the premier league they're no they're not statistically different than manchester united appear to be a bit better than them but manchester united aren't statistically better than arsenal um leicester did better than them but leicester are statistically better than arsenal so they can certainly use this to place themselves in a ranking of how well they performed last season but they can't say see spurs here that's quite interesting spurs aren't any different than arsenal at all and so they can't really use this to give any strong evidence of where they are or where they're going this makes it really difficult when you're planning a football club the fans have some expectations about something you should do this season you should qualify for the champions league and you can have those as a goals but when you're actually reviewing the data you find that you statistically aren't very different from most of the other clubs in the um in the league this that was an attack same thing you can do in defense haven't illustrated the things but again if you look at the p values aston villa's defense weren't statistically significant worse than arsenal's defense last season it's actually very few norwich's defense was statistically significantly worse than arsenal's defense last season and even manchester city's defense wasn't better so these give really even results um that's always attention there looking for these things which are really true statistical message methods that you as a statistician or a data scientist would be convinced about and what you um what the expectations are about being able to say that certain teams are better than others nice thing there is home advantage is not 0.2353 goals so i thought that was a nice nice thing to bring in there is still a home advantage it seemed to disappear there's quite a few studies looking at this how it disappeared a little bit during the indoor corona times so you could actually measure that using this this type of model and i think i've actually said this so this is i've emphasized this a lot in in what i said about the coefficients statistically significant differences are remarkably difficult to find in football and that's often why we tend to concentrate on looking at the style of play metrics that we use a lot of heat maps and passing networks and things just to identify the style of play because these things that we can actually measure more reliably and even there we can't necessarily measure them in a statistically reliable way but we can actually measure them more reliably and get more insight to them into them than just looking if we won or lost football matches it's really counterintuitive and difficult thing to understand but um we want to like go deeper down into our understanding because this overall are we doing better or are we doing worse just becomes a very difficult question to answer um just now i'm working with hammerbee and we're i think seventh in the league and when we re-simulate the table based on expected and goals and goals realistically we could be anywhere between second and eighth and this is a very difficult discussion to have especially if you try and have it with players and coaches who are actively involved in like lifting us up from seventh to second or first to tell them well we could have been anywhere between second and eighth is really difficult for them to swallow and it's very difficult for fans to swallow but that's actually what the models tell you with the um when you do look at these things properly there's a wide range of places you can realistically expect to end up on any particular season if the if the data told us that we were likely we were likely to be 15th or something then that would be a real problem but a span of second to eighth when we're actually seventh doesn't sound isn't that um that bad well it is bad but it's not that bad okay so i did one more exercise if you go back into the um um go into the back into the woman's football data i last time when i presented this i looked just at passes in my poisson regression so i just looked at this passes and i found that passes were a statistically significant predictor of good football and this time in order to do it properly i actually put in all of the teams into there so i do i see that there is a team related effect and the p values for none of the teams are actually statistically significant i think um i'm looking for the usa no even that so so actually passes turned out to be quite a good predictor and that's one of the things when you do a regression of this type if you're interested in a particular metric such as number of passes you should always correct this is called um i've forgotten what it's called in in statistics so you have fixed effect models so the fixed effects are the teams and the passes is the variable you're interested in what you try to do is you remove the fixed effects if you're interested in a style of a particular thing like passes and the the value that passing adds you remove the um the team variable because using these fixed effects variables and you see that none of them are statistically significant so passing the ball a lot at the women's world cup last time as far as we can see from the data here was a reliable sign of playing good football i think that's quite a nice result we removed the fact that for example certain teams pass the ball a lot and we still get this result so passing does correlate um and does seem to indicate better football in terms of predicting goals remember this was a price on pass on regression of passes and all of these other teams onto goals you can go back into file number eight in the github and do that nemesis yourself see that i've got some very nice studies of this this type so what what can you tell the question came up about passing intensity so this is another one this is really if you look in the academic literature compared to what you see written in well on twitter or something like that very seldom do uh does it come up like how what passing rates the teams have so this is number of passes while in possession of balls so number of passes divided by possession time this is a study by thomas grund who's a sociologist actually and he was interested in the property of social football as a social network and he tested passing rate i'll come back to network centralization but let's let's concentrate on um yeah let's actually start with possession because i said in an earlier um thing that ball possession wasn't statistically significant and so he looked at possession of the teams he looked at intensity which is passing rate and he looked at centralization which i'll come back to and he made he created three models the first model has no fixed effects so you just do a poisson regression of goals scored as ball possession and he finds that that is statistically significant at the one percent level but then he says well what if i put in a team effect so he was studying the premier league and so if he puts in the a just like i did here for all of the teams um all of the teams playing in the women's world cup he put all of the teams in in the premier league and he said well if i do that is possession still important and then he finds out no there's no statistical effect of possession and so basically the possession part here is that at this time when he was analyzing the game it was arsenal and manchester united who were the best teams they had more possession than the other teams and once you included them as the team the statistical effect dropped away and possession was no longer a statistically good predictor of success and then the last one he puts in both the home team and the away team that's like i did here home team here away team and then he found that again possession has no effect in fact possession when you account for the team's possession if there is any trend it makes the teams worse than better um then network intensity this passing rate i love a lot because it doesn't come up a lot in uh in public discussions about teams um i know that some teams do use this you basically divide the number of passes by the time and possession and he found that this was statistically significant definitely when you don't put in the fixed effects of the teams and to a sort of small degree here this was statistically significant at a 5 and a 10 level this is something to be slightly skeptical about but it it does seem to be a reasonable pattern so even accounting for the fact that teams are different some teams are better than others the regression indicates that passing tempo is more likely to produce good football and score more goals so concentrating on getting your team's passing tempo up is a way to expect to score more goals the last one he has said is network centralization i'll just tell you that he also found that two centralized networks i'm gonna say a bit about what centralization is afterwards but before i do that let's just say that two centralized networks that means that the ball goes too often to the same player tend to produce worse football so the more distributed your game is the better and the more likely you are to succeed and he found this actually a five percent five percent level it's a very nice paper and this is how you want to do it if you've got your metric you believe will predict good football in a particular way you have to do this second well you have to do all three regressions but you in particular have to do this last regression where you take into account team quality which is sort of taking into account the quality of the players i said centralized versus decentralized i wrote a medium article which you can have a look at about this um one thing that could be noticed in um the first this was an ibrahimovic that wasn't pogba's first season but manchester united in this season what was it three four seasons ago um yeah four if you count this this season their passing networks so this thickness of this like you've seen in some of the other lectures thickness of this is how often t players pass to each other their passing networks were very very focused on pogba nearly all the balls went through pogba up to rooney and ibrahimovic was also going through pogba and their results though this one they happened to win their results weren't quite quite as good as liverpool who have a much more decentralized style of play and even if you look at manchester city they'll have more links between them but they have a very decentralized style of football and these are these are anecdotal pieces of information but the grunt study shows that that's something to start looking for if you can measure network centralization which you can um i think some of the other talks have detailed how you do that you measure the network centralization and you see is your team playing too centralized and if so are they playing through a particular player and there is a statistical reason to believe that you might have problems if you have that too centralized style of play good wow um ranking models i well i've got here that's things yeah i think what do i want to do here i want i certainly want to take questions so if you um writing questions into the um into the chat i can answer them before i go on to ranking models wow there's lots of questions oh yeah some some people are saying all right okay that's to do practical things uploading um there's one question here is anyone else using this information to help themselves do not use this information to i'll just um really emphasize this point i have not said anything here which will allow you to make money gambling you need to work a lot harder if you're going to make money gambling i haven't there's a you have to actually it's a whole bit on logistic regression of odds and so on which you which i've missed out here deliberately in this course you will not get rich gambling using any of the models i've talked about today and so i better do this last part of the course because i'm going to point that out um some things about 10 and 10 11. can we see there's a limitation um yeah there's a question about comparing to another team than arsenal i'm a bit yeah i'm not quite sure i can do that directly or i don't want to say precisely how you would do it but basically those coefficients that you see in those tables um if i look at the coefficients in the tables they basically give the rankings of the teams of course they're very correlated with where you end up in the table um but you can actually i've done it statistically to arsenal but the coefficients there give the relative rankings of the team so you see norwich is worse relative to bournemouth i don't know technically exactly how you do that but there must be some way [Music] are you saying that game state isn't a thing i'm not sure i'm saying the game state isn't the thing i think it's not as important a thing um for the purposes that we're doing it again it relates to the gambling question if you're doing gambling then you need to you need to account for game state in these types of models but for most of the stuff that your interest if we're interest what we're interested in the course is producing better football what you're you don't want to start putting too much game state things in in the first model um i don't want to give a definitive answer of that because i'm not an expert but you can get a long way without without putting game state are clubs interested in the results of models like this or is there more more for stan yeah so clubs are interested in models like this so for example there's a company called 21st club and the types of models i'm talking about just now they rely heavily on their their business to do so what they do is they encourage clubs to be realistic so the type of discussion i went into a little bit about becoming second and seventh and so on they try and they go in and they work with clubs like southampton for example and say well what's your goal um and here's what we think is a realistic goal for you and where you should be and that becomes it becomes a much more balanced discussion especially at the board level and the board becomes more realistic about both about like the probability of going down for example for southampton you might say fans apparently were very enthusiastic about southampton this season but southampton actually always need to consider the probability of going down and they have to realize that there is always that three percent chance that southampton will go down independent of what they do and that's really difficult to take on board but you want to try and get the board of clubs to understand those types of things is the dixon calls formula the one you used in your book to compare betting strategies um sort of yes in one of the betting strat so one of the betting strategies i use passing rate and expected goals i did a dixon and kohl's regression to find the coefficients of um of those types of things maybe i will do a gambling if you want to now i'm just selling my my new book but i have a my new book the 10 equation starts with a whole bit on gambling and the equations you need to do gambling it doesn't tell you all the answers but it's a starting point for those types of things and the information that you need so my disclaimer is nothing in this in this talk uh well this talk might be the start of producing a gambling model but a lot more is needed good so then i think it's good because a lot of the questions are actually about limitations so i think it's perfect that i finish up with a couple of things about ranking models this is my own works i think that it's really for reference to have a look at later so elo models i'm not really going to go into details of what they are because i'm going to say that they don't really work for the purposes that we're interested in the elo model is a model which was mainly used for chess when you have head-to-head games between chess players you get points for a win um and you lose points for a loss but that point exchange depends on the level of your opponents so if a very good team loses against a very poor team they give the other team more points than if a a good team loses against another very good team then they just exchange a small number of points and that's there's a nice website at clabello.com which uses elo models to describe football so my problem with that type of model is that it has no grounding in the randomness argument that i brought up at the start it's just a sort of exchange of points it has no grounding in what actual football is and scoring goals and conceding goals it's not really grounded in that and i think you're going to do better with models which are grounded in um in that kind of thing so i don't i think it's fun and interesting with the elo models sometimes it's nice to look up especially for interleap comparisons um where you might rank the teams but i'm not sure that it's it's really the best the the best type of modeling and that's why i'm not going into it we might we might have a video later which does explain how you do it um but i'm not a big fan the the 538 global club soccer rankings that's got better and better over the years it started off as an elo ranking type of system but they've really fine-tuned it and done some very interesting stuff to make it perform better don't think it beats the bookmaker's odds but they have put in a lot of work and i basically would recommend that you look into the method methodology there um to work it but they do to use it but they do all of this simulating the league trying to find out the probability that their team will qualify for the champions league that they'll win the premier league interesting to see manchester city up there is 57 so they're really trying to make predictions of the future just a little bit about how it works what they do again you can read all of this from their web page what they do is they do a lot to look at the market advance so they have a base model which is the market value of the team so every year they've got the the soccer spi this is the global soccer ranking rating of the team they do a regression here they do a linear regression of the total value of the team the monetary value of the each of the sorry the other each of the players total them up give the um total value there and um they do a regression there to try and predict in the future how well they'll do so they predict their postseason ranking based on their their market value then they combine that this is just a random parameter i imagine that they've tuned up and chosen for some reason one-third of it is the transfer market two-thirds of it is how they were rated at the end of last season and then that gives them a pre-season rating so this you would call this the prior um for their model it's the what you expect before the season started it takes the rankings from last season and it throws in something to do with the market value of the players so this is a kind of wisdom of the crowds type of thing and then they get a pre-season rating of each team then they use goals shop-based expected goals and non-shot expected goals so the expected goals we've talked about non-shot is basically how often the team gets into the box and are in near shooting positions and it's a model i haven't actually looked at this of course but it's an interesting model you try to say how often when you get into particular position um is it likely to be a goal whether there is a shot or or not and you can fit well goals is simple but you can fit and expect it shot based expected goals as we've done in previous lectures you can also look at this non-shot which is just based on if there was a shot or not you get into that position and how likely you were to score and then for every match you look at the adjusted goals you look at the shop based expected goals you look at the non-shot expected goals and use that to update the rankings of the players it was quite cool to see they use exactly this i showed this heat map of the results i've no idea why they also used everton versus manchester city in their their example but apparently it's a popular example to to use this one is liverpool versus brighton most likely result two nil to liverpool but a whole range of different results that they can they can produce that then they simulate the league okay i think i feel like when i'm when while i'm talking i'm going on and on and on about the limitations of these models even though i'm presenting how you should use the models i'm i'm basically talking through the limitations of them um this is some of the stuff that i wrote in soccer matics this is a this is for example so pre-season predictions you see a lot of these now you have three different types you have um the ones based on models like the 538 model you often read stories before there's a big world championship you always hear about how price waterhouse cooper or some large consultancy firm have predicted what's going to happen in the world cup and there's always a story in the daily telegraph about that um and and yeah we've got the 538 model so you have models you have the experts who they're pundits who are saying um what's going to happen and then you can actually compare this just what is typically what typically happens in this championship so um one model you can compare these experts and the models to simple model is just to say well where you finished last year is where you will finish next year so liverpool will finish first in the in the premier league manchester city will finish second and so on down down the table and what i did is i compared the average position error for pundits who'd actually ranked teams these are all all the experts are put along here compared to the position the season before and you see there's only one person who over performed to any large degree this was for 2014-15 for 2013 15 people did there was a few people who did over perform a little bit i noticed this guy this guy in particular he over performed in these in these seasons but they didn't really over perform if you just took the average error over these seasons so they were all the experts were a little bit lucky for some some reason they tend to the experts seem to i looked into this in more detail in this blog i just put it up on medium if you want to have a look at it um this year this was the year that leicester won if you'd just written the i mean leicester leicester one so they had a very unexpected position but if you just wrote down everybody's going to follow exactly the same thing as they did the previous season you would have done better than all of the experts put together so these experts who are giving their punditry very seldom do better than just guessing what happened in the previous model a previous year and then even the models as well so even models based on the press on distribution um based on the rankings based on elo rankings i haven't tested the 538 model but it wouldn't surprise me if this was true as well they don't tend to over over perform what happened in the previous previous year and this is how much money you would lose if you use the elo if you just bet it at random you'd lose this much money if you use the elo club the euro club index you would lose this much money so really no difference um there either and you see this over and over again the euro club index doesn't beat the betting markets oh i said i wouldn't talk about betting but i did put a little bit in i mean joseph buchdale who who um who runs the football data he's written a book really just going home on this this analysis time and time and time again that you can't be beat the bookmakers the bookmakers odds are the sort of culmination of the wisdom of the crowds um so they are not likely to then they just don't be the bookmakers so the bookmakers odds just beat pretty much all of these types of models um he does have a strategy now now i feel i'm recommending he does have a strategy which is based on sometimes the bookmaker's odds become unbalanced and then if you're very patient you can find those types of things so this is his strategy that he outlines in the book which has made some kind of gain over the last five years you can look on the web page for how that works but it's a very boring strategy of finding when the favorites are mispriced and the favorites are mispriced now and again by certain bookmakers who aren't doing their job properly if you're very quick and you have lots of accounts you can go in there and exploit that tendency but for the most part you're not going to win money um booking against the bookmakers because they know better what they're doing than you plus they then have some edge between one and five percent over you um so you need a lot more before you're going to get there okay summarizing simulations and prediction um i'm just going to leave it to that because i've gone over time and i don't like to go over time and so i will say thank you for today um i'll i think i'll just have a quick look at these questions and and answer them for anyone who does want to stick around but otherwise i've finished the main main part of this talk and i will see those of you in the course on thursday and i'll also go into the slack group and answer any questions in the slack group i've made a new one for the the projects um anyway now i'm just now i'm just babbling i will have a quick look at these questions and see what what you've got here yeah there's a question what are the adjusted goals actually i don't know what the adjustable goals are [Music] looking at do you prefer 538 ranking over the euro club index yes i do i think i read i actually read about that yesterday and today and i'm very impressed with their new ranking system just how they built it up uh the statistical steps they've made i haven't looked i haven't looked to see if it beats the odds and things like that probably doesn't but it's more systematically created it uses more of the poisson regression approach that we've talked about today um and uh it works yeah as i think i do prefer it more than the euro club index uh could someone answer robert how you getting to the slack group you sent me an email robert i've answered it and i told you that you should look on the front web page to get into the slack group if you look on the web page of the course you'll find details of how to get into the slide group good okay great thank you all for today and i will see you see you for a lecture next week and for tutorials on thursday
Info
Channel: Friends of Tracking
Views: 1,693
Rating: 5 out of 5
Keywords:
Id: 3pnkARyrtMo
Channel Id: undefined
Length: 113min 35sec (6815 seconds)
Published: Tue Sep 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.