Stacked Ensemble Modeling with Tidy Data Principles

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
um when simon was a um when simon was an intern at our studio where i work on the tidy models team and simon is a developer a statistical software developer who works on open source r packages are mainly focused on modeling so he is studies at read college and simon recently was awarded the 2021 john chambers statistical software award which is really excited and so well deserved and is really really super excited and then i think got a an nsf graduate research fellowship as well recently which is also super exciting so what um what simon is gonna be presenting about today is something that i always love hearing more about and learning more about which is about um ensemble modeling so simon thank you so much for speaking and let's get started learning about this oh before you get started for those of you who are listening um if you have questions while you um while we're listening feel free to type them in the um in the chat and we can um you know help surface them to simon if he doesn't see them and we'll have quest time for questions at the end as well and and if you want to you know verbally ask a question maybe save it to the end and we can we can do that at the end so you maybe use those two opportunities if you've got questions as we go along together so simon thank you so much for coming and sharing with us thank you so much for the kind introduction um i'm like super excited to be here uh and i'm i'm really grateful that so many people showed up um so yeah i guess to start off i am also not from salt lake city uh but i want to do the the touring musician thing real quick and just say that salt lake city has a very special place in my heart so a little bit of background i'm from a small town in kansas and when i was wrapping up uh high school i started getting these like shiny brochures in the mail from a small school in portland oregon and so when i graduated high school i hopped in my combine and put it in reverse and uh and and made my way out to oregon for the first time and i know that animation was really smooth but we'll see if we can catch it right in the middle if you if you notice salt lake city is right in the middle there so when i'm making the journey out you can find me sleeping in the back of the cab somewhere in the suburbs of salt lake city okay so yeah amazing that is so funny my name is simon um i'm going to be talking a little bit about this art package stacks that i worked on while i was an intern at our studio and more recently have been working on this as part of my senior thesis project at reed college so if you've been to a talk of mine before you might remember two things in addition to maybe a tiny tiny bit of content if i'm lucky one of them is that i'm not very good at naming them and so thankfully dr silgi was was slick enough to not prompt me for the the title of the talk and and instead supplied a very very thoughtful and appropriate one so we have this very digestible stacked ensemble modeling with tiny principles so thank you for that and and the other thing uh you might remember is that i'm not very good at drawing so most of the slides will be in this sort of style where i've jotted out stuff with my uh my tablet so yeah there's kind of two elements to this there's the the tidy data principles and the stacked ensemble modeling since uh we're in uh you know a group co-organized by dr silke i'm going to assume that y'all are all familiar with the tidy data principles thing uh it's something about rows and columns if i remember correctly and uh for the snack ensembling part um i'm going to start with a story so when i was starting at read i was i was in my statistics professor's office hours um and and we were talking about just like modeling generally and what the process is usually like and he drew two axes um and so if you're not from salt lake city like me hopefully we can all come together on some familiar territory of the cartesian coordinate system on the x-axis we have the degree to which like a statistical model is difficult to fit so this could be um anything from like the computational complexity of a model fit how difficult it is to specify like what your model hyper parameters are how difficult like data pre-processing is for a given model anything along those lines and then on the on the y-axis we have some sort of performance metric so you could choose any arrowmetric you want whatever it is that determines how satisfied you are with how your model's performing and he was arguing that there's this sort of curve and somebody who's smarter than me can put in the chat whether that's concave or convex but uh there's sort of like diminishing returns where when you first start that those little bits of effort that you put in are really going to give you leaps and bounds in terms of performance improvements and and eventually there's kind of so much you can do or as much as you can and do to really pick up on whatever underlying structure is present in the data so just as an example if you don't really want to spend much time on a model at all you could fit a constant function at zero regardless of what the data is and you could go to something as complex as a neural network so it might take a long time to fit uh you need to think about like your dropout rates and your number of hidden units and something like that and so as i spent more time with statistical modeling i i kind of came across this folk understanding um that not all models this uh might live along this this curve so uh specifically the one i'm thinking about linear regression easy peasy knocking out of the park every time right uh and and so one way that we could formulate like a goal of this model stacking is is to be as extreme of an outlier here and and eek a little bit more of performance out um out of our model with just a little bit more effort and i promise this is a joke okay so we'll re reset our axes a little bit here but we'll still be on the x y plane so i wanted to give a little bit of like a simple example of what i mean when i say ensembling so if we have two predictors x and y and they're both or sorry two variables uh x and y they're both numeric and um there's some sort of underlying relationship or data structure that we want to pick up on between these two and so x is our predictor variable and y is our outcome and so if if you see this you have some sort of domain knowledge and you're comfortable fitting a linear model that's totally a fair game right so you could fit like a line of best fit straight through or maybe you have some sort of domain understanding or even you just look at this graph and you're like i think that the linearity assumption is inappropriate maybe we can do something uh non-parametric like a k n or something like that and so there's like all sorts of you know like literally an infinite number of models that you could choose to to fit and try to deduce what the underlying structure is of the data generating process and what an ensemble will do is uh make use of all sorts of any any statistical model that you're interested in working with and generate predictions that are informed by each of those so in this example this is kind of a simple implementation of a model ensemble that just takes the average of the linear model and the non-parametric model so this is called in practice model averaging and and you can implement any sort of model you want in in a model ensemble that the model that like figures out how to combine in this case the linear and the non-parametric model uh is called the meta learner um and in practice the the meta learner that most people use is a penalized linear regression model so um the linear model here in the non-parametric model you in your training step you figure out how heavily you want to weight each of these member models uh in determining what your your ensemble model's predictions are so in the rest of our time we'll be able to to play around with a quote unquote real world example another feature of my talks that i didn't mention is that i have to come up with something far-fetched uh for a real world application so if we revisit uh our combine story here so i'm a family guy right and so pre-covered you know every morning when i when i head off to class um i actually i stay in kansas and i drive from kansas to oregon quick 28-hour commute right and uh there's this sort of like cultural affectation out in oregon that i've picked up on since then there's there's a method of transportation that's much faster than a combine i'll type it in the chat for people i don't i'm not sure i can find the chat but people might have heard of these they're called subarus uh super quick method of transportation great gas mileage super fast all sorts of some great benefits there and so on a given day when i'm heading out for my house in kansas i guess you know post covid when i can do class in person i want to know uh you know when i go to bed what's the temperature which which method of transportation am i taking and given that information how long can i expect it to take me to get from desoto kansas to to read college okay so that's our applied example and i've i've painstakingly collected this data over over 500 days using the r norm function and so we can we can check out what this actually looks like so here you know sometimes i ride my bike from kansas to oregon i always note down the temperature when i leave and for some reason last night i thought it was a really reasonable way to measure the amount of food that i consumed in cups in the previous 24 hours so this is food and cups and and and we want to predict the amount of time and hours that it takes me to get uh to oregon okay so what are the building blocks of these things right like what in if we go back to this example on the scatter plot like in the tidy models what is this linear line thing what is this non-parametric line thing um so if you're not familiar with the tidy models ecosystem it's really like sort of an ecosystem in the way that the tidy burst is it's kind of like the younger sibling to the tidy verse in a lot of ways it's developed by a group of collaborators that you know meet regularly they all uh work for our studio and and all these packages have like a shared design philosophy and are designed in order to work really well together and um so kind of the elementary unit that that stacks builds off of we call model definitions and for people who are familiar with the tidy models this is a minimal instance of a workflow where you just have a parsnip model specification in a preprocessor and so what a model definition is in in one of the the main points that the titan tidy model's ecosystem makes about the way that we should be fitting statistical models is that we should separate the step of actually fitting the model from the step of specimen specifying like what a model actually is like the set of instructions that um it happens in in terms of specifying what a model actually is so um in the linear regression case if we're talking about lm uh if you're familiar with that function there's like the the parts of lm that are are saying this is a linear model these are the the algorithms and the forms of optimization that we use to fit um data to predict you know new values and then there's like this separate step of like actually carrying out those computations and so the the tidy models will allow us to provide these sort of sets of instructions to specify the models that we're interested in and then we can reuse those definitions throughout this process when we're fitting to different partitions of the data that we're training on so the first step here we we want to pass each of these model definitions through uh data resamples so when i say resamples um if if i have all of this commute data i want to split this up these 500 rows up some of them i'm going to use to train the the ensemble model or the members in the ensemble model some of them i'm going to use to eventually evaluate how that model is doing and you can use any resampling scheme you're interested in uh when you're passing these model definitions through so this is kind of like the first go at fitting these sets of instructions to some sort of data and so like uh common uh examples of a resample could be like a bootstrap of uh some data set or like a k-fold cross-validation and one of the kind of side effects of of defining or separating the step from specifying what the form of the model actually is and fitting it to something is that we can say like this k n model could possibly be using or this k nearest neighbor neighbors model could uh use any number of neighbors and so we could like try to fit a k nearest neighbors model that just looks at the two closest points or the four closest points um and all of those can be sort of encapsulated in this one model definition example this k-nearest neighbors model specifies four candidate members and each of these candidate members are the sets of instructions that are trained on some partition of the training data and each of them uses a different number of neighbors the linear regression model just outputs one uh candidate member right so this is one of the the lines that we're building our ensemble off of in a neural network model so we could like we could think about uh you know how many layers do we want this new network model to have what should the drop out rate be and things like that and so the the first step and maybe this will be uh easier to wrap our head arounds at this concrete example is once these models are trained we want to to collate uh the actual value of the outcome variable and the validation set so the validation set is just another partition of the training data that we use to to evaluate how how well our model is doing and so we ask each of these candidate members to offer their prediction on the the outcome in the validation set so in the uh example from our commute times and all of this code is on the github repository that i drink that i dropped in the chat um and i'm not going to go over in detail what this tidy models code does but i guess the the gist of model ensembling with stacks is that most of the code that you're going to be writing is tidy models code and it's kind of outside of the scope of this talk to to fully specify all those different steps but feel free to poke around the repository so making use of these these different model specifications one is for linear regression one is for k-nearest neighbors one is for a neural network the first step to constructing this data stack right the thing that collates all of the uh the actual values in the outcome column with the predictions from each of the candidate members is uh is to initialize the object so you can kind of think of the stacks function um like the ggplot function from ggplot2 it kind of like sets the foundation and then you can iteratively add new elements to it in order to build your your ensemble model and so if we just run this little bit of code it's just going to tell us this data stack object exists and it's ready to go and it doesn't have any model definitions or candidate members yet and so just running this first bit of code and if we plop it into the viewer we can see that so here we have the actual time and hours from the training data um that it took me to get from kansas to oregon and these are the the predictions of what this column ought to be uh from the linear model and so we can iteratively call this add candidates function on each of these uh these different model objects so on the k n and the neural network as well and we'll see that each of these models uh each of these models will offer their prediction of what this outcome time and hours ought to be so since the the k-nearest neighbors model um specifies four different candidate members where that each of the candidate members are using a different number of of neighbors we can see that the k-nearest neighbors model definition offers four candidate members and the same goes for the neural network and maybe we see that this this plot uh from earlier didn't quite hold true where the nearest or the neural network is predicting that um is essentially a constant function at one predicting that uh it's going to take me about an hour to get from kansas to oregon every time so that maybe an uh an advantage of an ensemble is that we we can kind of look at this data intuitively and say well this linear regression seems to be doing a pretty good job of predicting what this time and hour should be but maybe there's some underlying structure that like a k-nearest neighbors or even the neural network is picking up on and so it's kind of like an automatic way of examining all of the assumptions that each of those models are making about the underlying structure and figuring out how to combine them in a useful way and so that step of figuring out how to combine them uh more concretely involves fitting a regularized linear model on this data stack and so what this is going to do we fit a linear regression a penalized linear regression fitting the true outcome variable and all of the the predictions from the candidate members are used as predictors and the betas of this model in in the context of the stacks package we call them stacking coefficients so if one of these candidate members has a really high stacking coefficient then that means that its predictions of ultimately what the outcome variable should be uh are are important for like are really influential in determining what the ensemble's predictions will be and in the same way if uh if a candidate member has a stacking coefficient of zero that means that its prediction doesn't affect uh ultimately what the the ensemble's prediction will be at all so this is carried out in the package using the blend predictions function so you can just once you've added all your candidates using the add candidates function you can pass it to blend predictions and this will carry out the step of fitting that regularized linear model on the data stack and so lastly once we have all of these stacking coefficients we know that if a a model or a candidate member has a stacking coefficient of zero that we don't need to fit that model on the entire training set because right so far we've only fitted each of these candidate members on re-samples of the training data and sometimes fitting these members on the entire training set is much more computationally intensive than doing so on the re-samples and so now we can take a moment of pause and ensure that we're fitting a spot in as many models as we want to and then fit those members with the non-zero stacking coefficients on the entire training set and so in stacks this this gets carried out with the fit members function um and so really there's there's kind of four core functions that get you to a spot where you have a model object that behaves like any other you would expect it to so you can pass this stack object that's been run through fit members on to like predict uh and give it new testing data and it'll give you um you know predictions on that on that data um and it interfaces in the same way that any other model object would do so but there's kind of like this this very natural question right once you've fitted uh this ensemble like it begs the question of like how does the ensemble relate to the members that inform its predictions right like is this basically just like the linear regression and there's like a tiny little bit of neural network or something and that's ultimately what determines the model so like the second uh part of the functionality that this package is supplying are a few different functions a few different methods in order to understand like how is this ensemble affected by each of its each of its members so one way uh we can go about understanding uh how this model stack works is with a set of auto plot methods so there's all sorts of visual ways we could think about like uh understanding how this ensemble is working better so this sort of plot would be telling us um uh like as we increase the the penalty on that regularized linear model as we're shrinking more and more coefficients to zero in getting rid of members how does the the root mean squared error change we can see that it gets a little bit better until we really bump that penalty uh up and then we start to see a deleterious effect or we can also see that as we bump that penalty up the number of members is shrinking so this is kind of what we would expect as as you penalize adding members more you'll ultimately have less members in your model stack this auto plot method has a number of options to the type argument and so a number another like really helpful way of understanding how this ensemble is behaving is using this type equals weights argument and so this is going to show us what the stacking coefficients are for for the most like notable um models in the model stack and in this case we can see that there's actually only three total uh in the model stack and so this is dominated the predictions from this ensemble will be dominated by the linear regression and there's actually a little bit of influence from the neural network models here so again we can uh we can interface with this ensemble object and i've called it st here in the same way that we would uh with any other model object so we can pass it to predict and kind of check out like how do the predictions from the model compare to the actual time and hours so this looks like it's doing a pretty good job ideally this would be like a perfectly linear relationship here um but again we like the ensembling kind of begs the question of how do the uh the predictions from the members compare to the predictions from the ensemble and so another way we can look at this is comparing the uh is passing this members argument to predict so we've allowed users to predict not only on some testing set data the ensemble predictions but the predictions from the members as well so this kind of looks a lot like a data stack right except we can see that since we've only selected these three members we have the prediction from the ensemble as well as the prediction from the three members that were selected and so visualizing that data with a little bit of tidy r and d player magic we can see that we're essentially working with a linear regression with a little bit of neural network flavor there in the bottom and so the the ultimate predictions from this ensemble that are shown in red here are essentially these these outputs from the linear regression and they're just bumped down a little bit thanks to the neural network's outputs and so these are sort of like a few different graphical ways of evaluating the performance of an ensemble model another way we can do this is using functionality from the yardstick package which is the package inside of the tidy models and and what we can do here is just evaluate the root mean squared error from the model as well as from all of the member models and so we can see here that indeed the the ensemble has beat out the linear regression just by a little bit and hopefully with not too much more effort all right so what have we learned uh i would say number one thing ensembling is pretty slick hopefully it's not too too much more difficult than uh fitting another model that you would in tiny models and it it often has this effect of uh increasing the predictive performance compared to to many common statistical models um i i appreciate the opportunity to be here uh and uh next time you see a combine park somewhere in the suburbs just know it's me and i'm hanging out and i'll be on my way out in the morning thank you thank you so much simon here here in salt lake city super subarus are also an important part of our culture as well so they're they're big here combines not so much but um so there is um so so there's one one question that is in here that i think is a really interesting thing to talk about and something that kind of overlaps with some stuff i've been working on as well and that's about i'm like what like what is the output that you have um you know is it like are you dealing with separate models at the end um like what what what do you do when you want to go and make predictions and how might you um save that object if you wanted to use it later and um and someone asked like like if i wanted to put it into production like what object would i maybe put into a container so do you want to speak you don't you showed how to make predictions but do you want to maybe talk about what that object is maybe what's in it um and then um uh you know like that it can be like saved and and maybe i don't know if you've like maybe talked about a little bit like is it how big is it compared to other kinds of models you know and like how like how what kind of concerns have to be thought about if you like go to use it later yeah okay so many good questions here yeah so that that st object that i was kind of pushing around there at the end and sending to plot methods and stuff like that so that object can be saved in in the same way that any other our object would be with the save function but that that question about like how big is it and what's inside of it is a really good one so the uh the model stack object is actually just a list and there's all sorts of um like small things that the package keeps track of in order to be able to to um ask for predictions from each of those members and ultimately combine them so uh the i guess like notably the the biggest objects inside of this model stack one is the the actual uh regularized linear model that's fitted with glimnet that combines the predictions uh from each of the member models and then the the other uh the the biggest element of the model stack object aside from that um are the the fits of each of the member models and so um kind of like one of the considerations that you might be tossing around when you're determining you know how this can be best put into production is that a model with with less members will i guess a few things it'll be smaller like in file size and it will um ultimately like be able to generate predictions more quickly um than in an ensemble that has many members um another thing uh the the model stack object has a set of methods associated with the butcher package which is a package inside the tiny models and the idea of butcher is that if if really all you want to do is like train this model and then keep it around and generate predictions from it there's a lot that you can get rid of inside the model object i'll go ahead and type this in the chat [Applause] the name of the package is butcher the uh the size of the object in memory can be reduced by quite a bit by getting rid of everything that's not absolutely necessary to generate predictions and and so with those two things in mind like eliminating uh models by by increasing your penalty and um butchering uh once you've determined like what your ultimate fit will be um that thing that model could be uh containerized in um a similar way as other models the the one thing of note and this will be handled better in the next release of the package is that the libraries that implement the the functionality of the member model so like if i've fitted a neural network with kerast keras needs to be loaded uh at predict time uh and right now the package doesn't fail as gracefully as it ought to um that's something we're like that's something all of tidy models is really getting up to speeds with as we're as we're working more on model ops and like how to do a good job with model ops where all of tidy models is getting some real better support in that which is really is really good um so there was a question about um what kinds of models um first there's a quick question that's like i think someone knows andrew bray and is like hey do you know andrew bray do you work with andrew bray it's like andrew your advisor but i i andrew's not your advisor but you know andrew right andrew was the professor that that drew the silly graph um at the time nice nice yeah but andrew's not your advisor right um yeah i guess i should mention so this this package was co-authored with with max um and uh so the the first half of the package was written while i was an intern at our studio and then uh wrapping up the package uh was in collaboration with kelly mcconville who's my advisor academically as well as my thesis advisor at reed college so both of them played a major role in how this came together awesome so there's a question about um like uh what kinds of models can be used um as ensemble members and um like a question about like hey can i just use anything that's in parsnip or um or like does it accept things from parsnip or or like what what are the options so like what what kind and and you might maybe want to address like are there best practices in deciding what to try or what to include when you're trying like saying what should i put in here as possible ensemble methods yeah so for for folks who are more familiar with the tiny models any model type and i guess this is in theory you're welcome to file an issue if uh if i say something works and it doesn't um any model type that's implemented in parsnip um should be able uh to to be added to an ensemble any parsnip adjacent package um or or model type that has a wrapper using parsnip infrastructure um should be able to be added to these models in the same way any air metric that's implemented in yardstick or a yardstick adjacent package um should be able to be used to to define like what what is a model that's performing well quote unquote um in terms of like deciding what you want to add so uh whenever max gives a talk about this package he he'll add like 300 or 400 models or something um and and this is actually kind of what um a model ensemble in practice tends to look more like is people will add quite a few different model types both in that like the model table will be different and then they'll tune over a huge number of hyper parameters but the the biggest consideration if you if you really want to keep that number of models small is figuring out a set of models that makes a really diverse set of assumptions about the underlying structure of the data and the data generating process and that diversity in candidate models will tend to result in a better performing ensemble i miss muted sorry great that makes a lot of sense thank you for that um there's a question here about the resampling about using resampling to in that initial phase um and like why that's done and say say you had like um like maybe a ton of data or something um is there a reason to use lots of resamples there or if you had just a ton of data could you do that with just like a validation set um or is there like could you could you skip that and just like is like why is resampling used at that stage so okay i guess there's a few this is a great question i appreciate you asking it um the the stax package will won't support um adding adding candidates uh from uh from models that were like not trained on read sample data and and the way the reason that that is is um uh gets to this conversation about like overfitting um and so the ideally we want to to um these models should not be evaluated based on data that they've seen already um i guess i'll drop i compiled a few links and these are also in the readme of the repository but dr sogi and max wrote the tidy modeling with our book which walks through a lot of this functionality and the reasoning of why this stuff some of this stuff comes together in the way that it does and they have this idea i don't know julia if you want to talk more about it spending your data that really like evocatively kind of gets at why we only want to use small partitions of this data while we're while we're training nice i think that yeah those those are some good links and um i think generally gets at it i wonder um i do wonder so in in our sample you can set up like a single validation set as like a um as like a basically like like a single instance of say like monte carlo um resampling and i do you know do you know if it would work um do you have you ever tried that it i guess uh i'll claim that it works in this okay yeah yeah you'd expect it to work hopefully yeah any uh any resampling scheme from yeah like any because i because for example like all the tuning all the tuning functions will work that like if you just have a ton of data and you don't need to do like bootstrap or you know v-full cross-validation you're like i just need to make one validation set because i have so much data so i i would think that would probably work um there if you just had like a ton of resources there um so there's a question here that say say you did say you went through this whole process and then you have you have your um you know your ensemble and maybe for some reason like maybe it turns out one of your coefficients is just really big your stacking coefficients compared to the other ones can you go in to your um stack and like get out one of those models and then just use it like like can you like go in and just like use it because it's trained on the training data that's an interesting question um so you can surely identify like what the hyper parameters uh that were being used in fitting whatever that model is that's doing awesome and being weighted super heavily in the uh the ensemble the there's a helper function uh in the package called collect parameters i just dropped it in the chat and that function will will identify the members in question what hyper parameters they use in their stacking coefficients kind of like sorted in order and so like you could refit using those i guess is one answer i actually haven't tried to like pluck those out um they're they're certainly i don't know either fully fitted ready to go model objects um inside that list um i think it's under underneath the model underscore def's um subset of the the model stack object but i'd be interested to poke around in there me too me too actually so that's a really great question i think um that was a really great question um there's a question here is there something equivalent to pickling in r and yes you just save it's that's your like pickling is serializing you just save an rds file so you you can there's a there's a base r function there's um there's a function in the radar package it does it has different rules around um uh like zipping and um uh like whether whether you do or do not zip and whatnot that gives you a little more control so yes you what you want to do if you're used to pickling something in python what you want to do is you want to serialize it to rds um so it's like save to rds and so then that so that's a binary format the way that a pickle file is and so you can do the same thing um and then you can open that you know back up in a new r session and predict on it um i so i think we are these are some really great questions actually simon you got really great questions here thank you all so much for your really thoughtful question i think there's maybe just one more question here for you um how does how does sax handle data pre-processing um uh like if you have these workflows and here if you're can you have different preprocessing for different um kinds of models if they need it uh i mean julia you might be better uh like there were there were just workflows right from my understanding yeah if if the thing can be outputted by tune like using uh like fit resamples or or tomb grid or something like that um it's fair game to add yeah yeah then i think so like i think they're they're just workflows so i think you can just yeah i think you can you can um you know if you had something like a k-nearest neighbors model that you're like oh no this one really needs to be centered and scaled you know you could put the pre-processing that you need using like a recipe with that workflow and then put that in as a candidate and then if you had like a range of random forest that you're like throw whatever in there it's fine you know like you could have that like as a different workflow instead of have different pre-processing rules i i i believe that's part of the flexibility that's built into this concept of a workflow and why stacks was built on the concept of workflows so that you can have that kind of um that kind of flexibility um this is these are really great questions um does anybody and i feel like we really got through a ton of stuff here which is fantastic does anybody have anything else before we wrap up awesome well um here we can all do some some zoom clapping thank you thank you so much simon for that really interesting walkthrough of combines and subarus and how to how to use that um that was really helpful i appreciate you all coming so much to our lunchtime um talk here for um for the salt lake city our user group um next month in may we would love to have you join us as rachel tatman speaks about how to um um uh how to build a data science portfolio and we've got uh talks lined up through the the rest of the spring and then um we will take a break for the summer and then see how things shake out for us for in the fall so thank you again all so much for coming and we will we will see you next time
Info
Channel: Salt Lake City R Users Group
Views: 365
Rating: 5 out of 5
Keywords: Rstats, Tidy data, statistics, data science, data modeling
Id: HZXZxvdpDOg
Channel Id: undefined
Length: 46min 51sec (2811 seconds)
Published: Fri Apr 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.