Matt Dancho | Using R, the Tidyverse, H2O, and Shiny to reduce employee attrition | RStudio (2019)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] oh man this is exciting this is my first art studio conference by the way guys yeah I'm pumped I I'm a big fan of our studio the company and I think you guys know why it's probably same reasons that you're here so today what we're gonna be doing is talking about our for business so that's kind of where I specialize I'd marry up business with data science my company's business science so you can probably guess how I got the name business plus data science merger - you got it all right so what we're gonna be talking about today is a lot of work flow we're gonna be talking about how to solve a specific business problem it's called employee attrition and it ends up being a huge problem for businesses it's a big dollar figure fifteen million dollars per year and we'll talk a little bit about how I came up with that number but more importantly what we're gonna be doing is we're gonna be showing how the different tool sets including the tidy verse my favorite modeling package h2o and also shiny combined to really help us solve this business problem so I'll be your host again my name is Matt Dean Cho I am the founder of business science an educational company I'm a lover of our I've been using it for quite a while now and even contributed some open source packages as well probably the most popular of which is tidy quant and I do dabble in finance well but really I'm an educator of data science I specialize in teaching data science I both do on-site workshops at companies and also what I've done is created an online platform called business Science University where students can come and take a range of courses that really up level them accelerating their careers so oh and one more thing one of my special passions is converting business people to data scientists so I just want you guys to know if you are a business person in the audience you don't need a PhD to be a data scientist I'm just throwing that out there all right so agenda for today we're gonna be talking about a few different things the fifteen million dollar per year problem that's gonna be our focus we're focusing on a business problem its employee attrition and we'll find out a little bit more about what that means the second thing and this is what I'm super excited about I'm unveiling a new shiny web app that we're going to be teaching in the 300 series course and part of our program at business Science University it's part of the data science for business program so you guys are gonna see it first right here at our studio it's awesome and just one other thing about that app just want to give credit to Kelly O'Brian she's the one who developed them so she's a studio employee and I work with her quite a bit then we're gonna talk about the internals of the app the data science workflow what powers this app the tidy verse we're gonna talk about h2o and also another package called a line that I'm very excited about and then finally we're gonna pull it all back together and talk about learning are and how you guys yourselves can figure out how to do all this stuff that I'm teaching you okay so that fifteen million dollar per year problem so you guys might recognize this gentleman's face Bill Gates he was once quoted as saying you take away our top twenty employees and overnight we Microsoft become a mediocre company so let's dissect what he's saying take away he's talking about a concept called employee attrition employee turnover top 20 he's talking about high performers this is the top 20 people in his company and often what we find is the 80/20 rule applies top 20 percent of employees in a company tend to generate about 80 percent of the results so you really want to do what you can to preserve and keep and retain those high performers otherwise your company is going to become a mediocre company so let's dive into this a little bit further I want to talk about this curve it's called the economic value of an over time it looks a little bit like this so we'll see that there's four different boxes up here the first box is the curve so when an employee just starts at a company they represent that green dot there right at the beginning so time has not elapsed and actually what is happening is that company is investing and actually losing money having you as part of that company and that's an investment that they're making in you so they have to provide you training mentorship they have to integrate you and it takes a while to do this before you can become a productive member of that company then eventually you make it to the second box and this is what's called the break-even point so as you gradually get to that point where you begin to start generating returns for that company and they call that the return zone so you've got the investment zone you've got the return zone and that process can take as little as three weeks for you know jobs that aren't overly difficult or for highly technical jobs it can take upwards of a year or longer so that's the type of investment that that company is making in you so what happens when a person decides to quit that's what this third box represents so that employee has become a productive member of the company they're generating returns doing really good work and then all of a sudden they decide to quit and that could be because they've you know they hate their boss that could be because their work-life balance is all out of whack or they may just not be like the the work that they're doing and what ends up happening is when that person quits that line immediately dropped down to zero and then it extends for a period of time and eventually that company decides to replace that that person typically and then that cycle repeats itself so there are reinvesting in the new person getting them up to speed and it takes a while before they get to the return zone so as you can see in box number four we've got lost time and lost productivity this is what you want to avoid especially for high performance so there's two different types of attrition and we're gonna focus on trying to prevent your high-performing employees from leaving so not all employees are created equally some employees just never quite get there and that's what this first box here is what we call necessary attrition and this could be because that that person you know it just might not be a good fit for the culture of the company may not mesh well with it might be a poor job fit whatever reason they're they just you know don't quite cut it and it's okay to lose these people but what you really want to prevent is the right-hand box which is bad attrition and this is when you have high performers that have generated returns you don't want to see them leave the organization so you can actually assign a dollar figure to this particular problem and it's actually a very simple calculation in fact this is some are code I know it might be a little bit difficult to to read but what this is is a function that I created it's called calculate attrition cost and it takes a few different parameters and then it performs just a vectorized calculation that incorporates direct costs lost productivity and some assumptions in there and then also it subtracts out the salary and benefits which is what the company actually benefits from losing an employee so for a high-performing employee that is a productive member of that company it can be upwards of seventy eight thousand dollars per employee that that company loses when you assign a costume so the funny thing that happens is is typically companies don't just lose one person it ends up being more of a systemic issue and so what happens when a company such as say you know Walmart or Target you know a competitor moves in starts stealing your high performers before you know it you've got a couple hundred that you've lost and if you lose 200 performers each year that's a 15 million dollar problem yikes so we want to prevent these high performers from leaving the company I think that's pretty obvious by now so way we can do that is through using data signs and I'm gonna show you the the product first just because I want to show you guys the end result so we're gonna do a quick shiny web app demo and again this is something that's exclusive to the art studio conference I'm really excited about this this is what we're developing as part of the 300 series course so there's a 100 to 200 and 300 and those who take that course are gonna be able to learn how to build this app so I'm just gonna click this run app button and if all goes well we get a shiny app so what this app represents is the end product you can imagine that you've got managers out there that are responsible for their employees they are responsible for retaining them making sure that they develop them into productive people making sure that they're happy and that's exactly what this Apple allows us to do so I'm just gonna scroll and kind of show you each each one of these numbers represents a specific employee and this particular one employee number 891 has a 47% prediction risk so this is this this employee is actually predicted to leave because that prediction risk is above the threshold for deciding whether or not that employee leaves and this is actually h2o under the hood that's generating this predictive model and the app is encapsulating it so you can imagine put yourself in that managers shoes they see that they've got an employee that's 47% likely to leave and then what we have over here are the reasons the features of why that person is predicted to leave so this is actually comes from the line package and this first feature here is stock options that's that person has stock option level 0 so this is something that the manager can actually affect or a change it's what's called a lever the it's a feature that the manager can adjust so maybe moving that employee from level zero to level level one and giving that employee stock options that might be enough to help that employee to stay the next one is the employee has over 28 years at the company the next one is that the number of jobs that that employee worked at is over 6 and then the next one is that that person has a training time last year of 2 so these these two here are not levers that can be adjusted you can't really toggle the number of years that that employee works or how many jobs that they've had previously but that manager can then take a look at maybe training times maybe to help get them engaged give them a few more training sessions a year so this is really cool we're able to actually bottle up data science into this machine learning app and even further than that we can develop predictive recommendations that are able to be presented to the manager so that way they don't have to kind of think of strategies to do but that's already incorporated for them so for example this management recommendation strategy has a work environment strategy of promote job engagement so that manager should then focus on activities that will promote job engagement so this is what we're talking about when we provide shiny web apps to non-technical people business leaders that have a stake in the game where they can actually effect and make decisions that better improve the company that can make a huge difference so let's talk about that next let's go here so this is what I call the better decision-making effect this is when you start to have you provide the shiny web app to the decision-makers in your company you see that they're making you've trained them on the app how to use it and you see them making positive decisions what ends up happening is as you monitor the impact of the decision making improvements you can start to see your percentage of attrition going down down and down so it starts up here around 20% ending in about a year at 9% and then you can actually use that cost of attrition function to calculate how much you're saving in the organization and then how does that look when you show your boss at the end of the year hey I just saved you your company 4.3 million dollars and that looks pretty good and you'll probably get a I don't know maybe a promotion out of it anyways the better decision-making fact that's what I want to get across here is that we can use shiny apps to actually influence the decisions within our own organizations and that's how we generate business value so what I want to talk about next now is the data science workflow so we talked a lot about the end product hopefully you like the app and now what I want to do is talk about the process to get to that app so how do you go from business problem to business value it requires a process that is can be shown as a workflow there's three stages preparation experimentation and distribution so in the preparation stage you're acquiring data and you're reformatting and cleaning data at that stage then you move into the learning which is the experimentation phase or you've got your data you're coming up with hypotheses you're meeting with stakeholders in the company that know that data inside and out to help you develop your strategies and then you move into the transformation and visualization stage and then you get to modeling and validation and then oh you find ok we don't we aren't getting a good model yet I got to go back and read about re-evaluate our hypotheses or you may have to go back and acquire data so you've got kind of this process that's iterative and eventually you get a model that starts to show and prove that gives you an idea that there could be a possibility of changing decisions you then move into that final phase of distribution you develop reports you convince people in the company executive leadership that this is the right thing to do and then you build an app and you deploy it and you start getting those business results so that's how you generate business value what I want to show you is the tidy verse is like it's a perfect fit you can see that we've got a bunch of packages that I've overlaid you've got read are for reading and data you've got deep liar tidier string our date for at effort time series stringer for text for cats for categorical data to help you clean that data you get into the experimentation stage and once you're doing the visualization and the transformations you're working with deep liar in ggplot2 h2o once you get into the modeling and validation and then you move into reporting you've got our mark down and then deployment you've got shine it just fits and this is what I call the arc tool chain it's a fabulous tool chain I've been teaching it for quite a while and I'll tell you what this gets results so if we examine our workflow the plier it's awesome for sizing the problem identifying where we've got issues and our data this particular analysis what we're doing is looking within job roles to define where we've got high cost of attrition by cohorts within the data ggplot2 imagine if you make a lollipop chart like this and you show your management that hey we've got a problem here and you know what I've identified where we've got that problem the sales executive role three point nine two million per year that's how much is costing us we'd better try and obtain some of those high performing sales execs laboratory technician that's the next one three point eight five million we should focus on them then you get into the modeling you've got h2o and literally this is all the code that it takes to make that model so I want to give a quick shout out to Aaron Liddell at h2o her team has done a fabulous job to make a super scalable high-performing machine learning algorithm and literally this is called automated machine learning it's automated so it makes it super simple to get all the models 20 different models GLM's deep learning stacked ensembles when they when they merge them all together and average them gbm's they have just done a really good job and this is all it takes to start getting a predictive model and then we need to explain this feature so it's not good enough to have a prediction risk-management doesn't care about that what they want to know is how can I change my decision-making and this is what lime gets you so this is all the code literally it takes you to develop that lime this is it and you make a lime chart like that and you can see right away that this employee has overtime equals yes so maybe that their work-life balance is out of whack maybe you should look at that to try and retain them and then shiny you've got h2o lime deep liar and the rest of the packages that's all under the hood shiny is really the glue that allows us to join everything together and that's what I'm so excited about is just that the shiny web infrastructure makes everything so simple to get business value business results so again this is the data science workflow just to recap you've got this amazing tool chain and what that allows you to do is build web apps that can change decisions and then don't forget you need to monitor those decisions and show the savings that you're generating for the organization all right real quick I just showed you an amazing tool chain you're probably wondering how do I learn all this stuff well in my experience I've been doing workshops and I've been doing online education for quite a while now learning art is like a hillclimb you don't the the good data scientists really focus on the fundamentals first and then they move their way up the hill so the fundamentals data cleaning and manipulation that's learning like deep liar tidier string art for text blue bird eight so on we've got visualization functional programming advanced data science and the goal which is the shiny or markdown all right so before I end here I just want to give you guys a quick bonus we actually do have a number of courses that we're developing strategically to be able to help get you there if you want to get there fast it doesn't take years it takes weeks and I even have a special bonus for you we're giving everyone 15% off a business science at University the promotional code is our studio all right thanks everybody for your time [Applause] [Music]
Info
Channel: RStudio
Views: 4,134
Rating: 4.7333331 out of 5
Keywords: Matt Dancho, rstudio, data science, machine learning, python, stats, tidyverse, data visualization, data viz, ggplot, technology, coding, connect, server pro, shiny, rmarkdown, package manager, CRAN, interoperability, serious data science, dplyr, forcats, ggplot2, tibble, readr, stringr, tidyr, purrr, github, data wrangling, tidy data, odbc, rayshader, plumber, blogdown, gt, lazy evaluation, tidymodels, statistics, debugging, programming education, rstats, open source, OSS
Id: 9VztG5c1bwk
Channel Id: undefined
Length: 20min 20sec (1220 seconds)
Published: Mon Jul 29 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.