Partial dependence plots for Mario Kart world records

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi my name is julia silly and i'm a data scientist and software engineer at rstudio and today in this screencast we are going to use this week's tidy tuesday data set on mario kart world records and we're going to train a model to predict whether any given world record was achieved using a shortcut or not so we're going to use a decision tree model for this and decision tree models are are so nice um you can understand them they're very explainable and um they uh the the pre-processing required for decision free tree models is very low maintenance which is really nice um they can tend to over fit like you know like when we do a single decision tree model so we'll be sure to um tune them carefully on re-sample data so that we do not overfit the decision tree model and we're also going to walk through and talk about how to build how to do a partial dependence profile or a partial dependence plot um for for the um the probability of using a shortcut with um with um the the time the the length of time to like the world record time um we'll use the daleks patch package for that which works really well with tidy models let's get started okay let's learn about mario kart world records all right so let's look at this data so we have here this is this week's tidy tuesday data we have information about um what date the the record was um what the date it was made the time uh this is in a different time period this is in the site in seconds how long it was a record for the person who did it um whether it used a shortcut or not whether it was a three lap or i believe a one lap let's look at that count uh type single lap or three lap and let's see how many tracks there are here 16 so that's not too many that's that's about right or not about right that's a reasonable number to include in a model here and then that shortcut a short cut there we go um no and yes and there there's a little bit of some um imbalance there but not too not too bad there all right so that's the data we have so what i what we're going to do here is we're going to train a model to learn and predict whether a um based on these other characteristics of the world of the record um did someone use a shortcut or not um like can we predict that so let's just make like one um exploratory plot date time let's say color equals track and then let's um [Music] let's put points so we can maybe say like over time and kind of see how that changes so i don't want i don't want to see all those 16 legends i'll cut you know tracks on the legend but let's let's put a um let's do a grid and let's put on the rows let's put that um whether it is three laps or one laps and on the columns let's put the shortcut there nice okay uh let's make the scales free like that okay so um we've got no shortcut in the cut this column and yes shortcut this way we've got single laps and three lap times here or you know like uh when they were made was it during a single lap or a three lap run and um no shortcut yes shortcut no yes okay kind of interesting there and we do see some d drifts down but then it's pretty stable after about you know 1999 or so i don't know it's pretty interesting okay so this is the data and you can see you know these are like different tracks in these different colors which is very also very interesting so we're going to use all this data and we're going to try to build a model to tell the difference between a um a record that was made with a shortcut taking a shortcut on the track and one that was not so let's build a model here well so let's load tidy models here um let's go like that and then well first let's set up our um our data spending our our spending our data budget so let's look at records again um i think let's use so shortcut is our outcome let's say let's use track type date and time and um let's if it is a character let's change it to a factor factors are generally better for um for modeling let's do it let's do stratified sampling we have a little bit of imbalance but i think i'm just i'm not going to handle that this time so that's a split and let's make our training set from the split let's make our testing set from the split and then let's make a set of um boots of um resamples so mario foals v-fold cv so we remember we resample the training data and we can also do um re uh stratified resampling there um oh that's very see how there's hardly there there's very a very small number of examples in the assessment set of each resample so let's change that from um v-fold cross-validation to bootstrap resamples i think that will be better okay so we're gonna do a pretty straightforward model but we are gonna tune it so i'm gonna do a decision tree model so let's call it tree speck so we're going to do a decision tree and we're going to tune let's tune the cost complexity and let's tune the tree depth um we could also tune min n if we want we're going to um use the engine our part and we are training a classification model like this so let's um clean that up a little bit like so so this is our model specification so our model we we say what kind of algorithm we're going to use it we're going to use we're going to tune um these because decision trees overfit pretty badly so um uh we generally um and they generally do need to be tuned to perform well um so we now next let's set up what some some values for these that we want to try so let's make a grid a regular grid here let's call it grid regular and let's say cost complexity tree depth and let's say levels equals seven so i'm going to try seven different levels of each of these so that means overall we're trying 49 different models and now let's put those things together in a workflow so we're going to start by declaring our workflow and we add our model to it which is tree spec one great thing about tree based models is that they are very um low maintenance when it comes to pre-processing they don't they don't require much they are um what did i do wrong set oh a set engine set mode whoopsie daisy okay there we go so the um the the tree-based models like a decision tree are very low maintenance when it comes to to pre-processing so when we look at that training data data you know we've got a date we've got a number here we've got factors one of which has 16 levels and that's all totally fine we can just like throw that all into the decision tree and it will it will work great or hopefully i mean it won't if it doesn't work great it's not because of the prepare because of the pre-processing so so um let's use parallel processing to make this go faster and let's put in our workflow let us put in our resamples which are these bootstrap folds we made and we are going to have a grid which is this grid of possible values and then let's save the predictions in each um for each resample so we can make some plots let's call this tree res like this okay so and then i'm going to start this and it is going to um uh train so it what it's going to do for us for those 49 possible decision tree models it's going to fit that each of those 49 decision tree models to each of these 25 resamples so it will fit on this part of the data and assess on this part of the data so fit 49 times assess 49 times fit 49 times assess 49 times so this um [Music] might take just a minute that you know we multiply it out and that out is quite a number of models to fit so i'm gonna pause the video and then come back when it is done all right it is done now it um so we fit all of these models and we have all these sets of predictions here and now so now we what we can do is we can uh choose which one of those we're going to use and then set that up and and see how it did so so for example we can say collect metrics on that tree result there so that's all the metrics for all of the you know almost 100 models that we tried if we want to see which ones are turned out best we can say show best here so it is um choosing that with metric um roc auc we could also do it with um accuracy if we want so this is the roc um that we got with these best values here and it looks like tree depth of eighth was best we can we can look at um a visualization of how our tuning results turned out like so so if we let's zoom in a bit here so we can take a look so um let's notice so look tree depth of one bad stubby stubby trees no good um and then if we look up here um the the tree depth of eight is best it um and then uh the other ones uh you know are are worse right like it kind of goes up and then back down and it looks like let's see so cost complexity so we get we looks like we get some a best value kind of right around here for accuracy and a roc auc looks really flat and we could you know like choose something in here i might chew i might choose by accuracy so i can come in here and get that get that value there so if i choose accuracy like this oh let's um this is the this is the one that is it says is best there that that's that little bump up um you know and it's it's a slightly simpler model there so that seems like a good option we can also look at the predictions so we can say collect predictions on that tree res and it's going to unnest all of those predictions for us um like that it's quite a lot so it'll take a minute there and then we can do something like group by id which is that boot those boots group by each bootstrap do roc so the um the true value is shortcut and then we would say the predicted probability of um no like that so that will compute an roc curve for each bootstrap and then we can plot that if we would like so it's going to go through and do all that and we can see a an roc curve so you know with decision trees we often get these kind of funny looking um roc curves because it's uh making um uh it you know like it has eight it only has eight the tree depth is eight so it's making uh choices in that way so that is um looks you know about how we expect pretty good there and so now we can decide how we will choose um our final tree so we can say we can say select best from the tree res we can say by metric accuracy like this so we can say choose tree like that choose a tree we could also say select by one standard error on the tree our tree results um and we could say if we say minus uh cost complexity so what this would do is it says not the best cost complexity but the um and and this is on the on the roc auc metric but not the best cost complexity which is just like the smallest one but um a simpler model so one where um and that's actually the same value so that's good so we'll just that looks like that looks like a good one to choose there and so let's so what we do is we'll take our our workflow which was a tunable workflow and then we'll finalize it finalize workflow with this tree that we have chosen and then we can do lastfit so lastfit fits one time on the training data and evaluates one time on the testing data so we'll give it the split which has the training and the testing data in it and so let's call this our final result so it's fitting it's one one time and so we can collect metrics on this and these metrics here are on the testing data the testing data not not the training data and so we can compare this to say when we did show best and we get about the same you know roughly the same value so um this tells us if we are or not overfitting decision trees um do tend to overfit so that's something for us to make sure or see you know see that if we have or have not done it and then if you look in this final result it has a fitted workflow in here we have to get it out like this which is a little bit awkward but um and this is a fitted workflow like this and we can save that we can predict on it so if we predict on the final fitted workflow for example if we we could you know on new data let's pretend this is new data like we could say um you know we could we could we can give it data we can predict on it and um this is an object here that we can save um you say using save rds or write rds and save it and then load it up in another r session and predict on it later um for the for um future use and we can see here uh like the first thing here is um the first split here is uh on time and then we start splitting on tracks and uh what else is in here a time tracks time type single lap versus um the three lap so we've got a different ways we can you know there's there's different things that you can do to visualize that what i would like to do in the last bit of my video here is to um show how to do some explainability so to see decision trees are nice because they are pretty explainable already like you can actually see the decision tree here how it's going to make a decision but um if you look at time here maybe we would like to understand how the how the predictions for whether there was a shortcut or not change with time and um the way to do that is to make a partial dependence plot and um what i the way i like to do that right now is with the package daleks and doll extra has a so pretty good support for tidy models so if we there's a function called explain um tidy models and what we do is we uh we can give it our workflow our fitted workflow um and then we give it two things we give it um the input data and so for us that is uh the data that it was fit with where i'm getting confused here so that's the the training data it um it actually does not want the y so if we if we say for example say um mario train and let's not give it shortcut like this and then we give it the um uh the shortcut separately as y and so let's call this mario explainer so the first step to using daleks is to create an explainer uh right i forgot okay so it does not like it the factor level it wants it to be an integer like this um okay so we've got our explainer so here's our lovely explainer here so it is a workflow here's the data that's in here and now we can use this explainer to do a lot of different kind of things we can do shapley values we can do variable importance let's do a partial dependence plot so the function for this is called model profile so the first thing we put in here is the explainer like so and then um we can say i think um that's all we're going to use all the variables i want let's use more than 100 i think if we put null it'll be everything and then the very oh no wait that is the variables i want to be explained so variables variables and we're going to say time let's put time on the x-axis for our partial uh dependence plot profile no profile okay variables and and then yes we can do groups which is always kind of interest or often kind of interesting and type so type for us remember is um does it is it three lap or one one lap here so we do do we see these these differences so let's um clean that up a little bit and let's call this a partial dependence plot with time like that and so it has it has computed that for us if you want you can just use daleks's default plotting but you also can just get out the data that's in there so p dp time so there's a couple things so this is uh changing the color this is the aggregated profiles and this is more like individual ones so if we do that we can just make the we can customize the plot however we want um which is something i like to do so for example let's this label let's uh um let's remove from label this workflow underscore business like that and then let's make a plot gg plot let's say aes let's put um x on the y axis let's put y hat i could maybe do janitor clean names on this and make this easier and color equals label like so [Music] and then let's use geom line like that and i am mistyped the pipe like that and now so now i can make this plot look however i want um if if the defaults are not how i like it to look so i so what's on x that's the time to complete um the lap like the time to complete the the record um i'll say some time to complete track because there are different tracks um i'll say why that's the that's the predicted probability of um shortcut and um say color equals null and so type so what is this is a partial dependence plot for mario kart world records like this um nice and i can you know i can put more info on here subtitle like this is from predictions from a decision tree model and so that so decision tree model often have you know these kind of funny shapes um remember that first split was at like 85 seconds and that's why we see this big drop here and then we see that we don't we have kind of the same shape for both um for both types after that but we and you um we see this increase up here that's probably for like some certain for some certain tracks the shapes that we see here but we and this approach can be used for any kind of model um but we're able to learn about how some probability changes based on um what we can get out of these partial dependence plots all right we did it we uh tuned and trained our decision tree model so it's not over fit but we were able to take make use of its sort of low maintenance aspect when it comes to pre-processing and we got pretty good you know results predicting whether a mario kart world record used a shortcut or not and then after we you know went through the process of training our model we used the daleks package which has great support for tidy models to do some model explainability decision tree models are pretty explainable already but because of the way that they work often we don't you know it can be difficult to just look at the output of a decision tree and say oh how you know how does this model uh behave with respect to um you know um the the the time in this example like the length of time it takes to go around for the world record took to go around the track so making something apart like a partial dependence plot a partial dependence profile gets you that so i really like tools like that for being able to understand our models more deeply after we have appropriately tuned and trained them so i hope this was helpful and i will see you next time

Info

Channel: Julia Silge

Views: 2,111

Rating: 5 out of 5

Keywords:

Id: bn48fQ8aEDA

Channel Id: undefined

Length: 27min 28sec (1648 seconds)

Published: Fri May 28 2021