Improving Speed of Shiny Apps by Pre-Computing Models

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Everyone can see the screen now. Yeah, thank you very much for having me. So to share with you a little bit of my experience through the journey of creating Shiny apps. First of all I must emphasize that like I'm by no means an expert or professional in creating apps, let alone Shiny apps, but there are a lot of things that I learned through creating some of these apps, and it is a great honor to have a platform like this, and to talk about some of these experiences. Okay, so a little bit of a sort of housekeeping. You can actually access the slides through this link. So bit.ly/Shiny831, and there are some R scripts that I'm gonna be using for these presentations. And once you can access the slides, you can actually download from there as well. Okay. So, well, yeah, I should go back to the title for a little bit. So what the title suggests, today's presentation is gonna be focusing a lot more about sort of the usability of our Shiny apps. When it comes to creating an app, we really want to make sure that our app is responsive, is fast enough so that people don't, you know, like click a button and wait for, I don't know, 10 seconds for another screen to comes up or something like that. And oftentimes this involves a good design of, you know, your workflow behind what's going on in your server side of your Shiny apps. But it also involves a lot of efficient coding as well. So that is kind of like what I'm gonna walk you through, some of my experience in creating some of these things. But first I will demonstrate one of the Shiny apps, sort of quite elaborate ones that I have done as part of my PhD research. And this is specifically on looking at the relationship between travel time for health facilities and malaria. So malaria is still a devastating disease affecting hundreds of millions of people around the world. A lot of us might not have heard much or worried much about malaria, but for people in the Sub-Saharan Africa, for example, this are where malaria is still affecting the economies and livelihoods of people. For this project I worked with my PhD advisor, Dennis Volley, and my former lab mate Justin Mila, and also our Ghanaian collaborators. Focusing on a district. There is Northern Northern side of Ghana. It's called Bunkpurugu-Yunyoo district, pretty small district up here at the border of Togo. And what you're looking at right here is a result of a survey done back in 2010 to 2013. You're looking at the malaria prevalence of the district here. Okay, is there a way to get rid of this? Okay, come down here, okay. What you're looking at is the malaria prevalence here. Specifically, the survey was focusing on the children under five. This is the age group that is most affected by malaria compared to like the older children and adults. So as we can see from the map here, even within the district itself, you can see that the Southern regions of the district is, you experience very high malaria prevalence up to like 80% or 90% down here. And then in the Northern area you get to a much lower prevalence, like 10%, for example, 20%. There's two open center in this district, it's called Napanduri here and Bunkpurugu. So based on the data that we collected and based on the statistical analysis that we've done, it comes up that one of the very strong predictor of malaria prevalence here is actually distance or travel time to health facilities. When we say this statement out loud, it really makes sense that, well, if that's the case, why not we build a bunch more health facilities all over the district, and this would probably help to bring down the malaria prevalence. And there is a little bit of causal relationship there because we know that early treatment and diagnosis of malaria can actually stop the malaria transmission as well. So that is the whole idea of this app. We want to let people to explore what happens if you add more health facilities onto the map, what does that mean to malaria prevalence or malaria incidence? So as you can see from the metric here, you can choose to see malaria incidence, and some malaria prevalence, or what does it mean in terms of travel time to the nearest health facilities? The whole idea of the app is that when people interact with the map and try to add more health facilities, they get the idea that what really are the numbers that our data tell us, right? And whether or not it makes sense. And if it makes sense to us, where is the best place should we place our new health facilities. Okay, so that is the whole idea. And what you're seeing here, there are like eight health facilities that exist in the district. Five of them are in blue colors. They are the health centers, and three of them are in red, they are the sort of community-based kind of like mobile health post. Okay, so lots and lots of things that you can look at. Again, if you look at travel time, for example, of the map, you can see that in the Southern regions of the district the travel time to nearest health facilities tends to be large. So people have less access to healthcare in the Southern regions of the district, for example. Okay, so, what we ask the user to do is, well, you can choose where you want to add a new health facility. Say, for example, I look at this map, I say that, well, this region's people have less access to healthcare, let's add a health facility here. So click on that, there's a red dot up appear here, and I can say add facilities here. Okay, once I add the facilities, I say, update the predictions and let me see what does it mean after I add these health facilities here. Okay, once I added it, perhaps it's easier to look at the difference from the baseline because we can now see what has changed. So right now we are looking at travel time. You can see that by adding a health facility here we improved the travel time in the close vicinity of this health facility, it all makes sense, right? But you can see that the impact is kind of localized. The predicted impact, I must add, is kind of localized here. And you can also look at prevalence, again, the changes is pretty much localized to what you see here. Okay, so, again, incidence is even lower, it seems like. Okay, and then over here, this would show the district-wide prevalence, district-wide incidence and district-wide travel time per person. And you can see that the reductions is actually very, very small for adding the health facilities here. So it kind of is strange, right? Apparently I have improved, I have added health facilities onto a spot where the travel time was high, so what happened here? That's because there are a lot of interactions with the underlying populations. So there were simply more people living in the Northern area of the district compared to the Southern area of the district. So when you improve the travel time, you improve the travel time for maybe like a small portion of the people. And as a result, you don't see a big impact when it comes to district-wide kind of metrics. So let's say now that I have this mental model, I go back and say, okay, let's choose a spot that is near the roads. It's not very clear here, but it's kind of, you can see that there are three roads leading to this particular spot called Yunyoo. It's sort of a very small town, rural town here. Let's add a health facilities here. Okay, so once I added, again, I click update predictions. Not too obvious, but I can look at the difference, and I can see that the improvements in terms of prevalence, it's a whole lot more, like you can see that the reach of these new health facilities is a lot more compared to the spot just now. And you can also see that travel time is also improved just by marginally for a lot of regions around this area. Basically you see that the impact area is slightly larger. When it comes to district-wide prep metric, again, you can see that the numbers, the reductions from the baseline tends to be larger as well just simply because we choose a spot where the populations is higher. Okay, that is very much sort of like a first functionalities of the app, you get to choose and pick a spot, and choose and see what it means in terms of your actions of adding new health facilities. These are all based on the data that is generated back in 2010 to 2013. There is an underlying model. We are using a generalized additive model with distance and some other covariates driving the predictions that you see. Okay, so let's say you're done adding new health facilities and playing around with this part of the app. You can also ask the app to optimize the locations for you. Choose the criteria that you want, for example, in this case, I want to reduce the district-wide prevalence of children under five as much as possible. And say, for example, I want to add three new health facilities. I can click the optimize button here, and it will tell me in purple these are the three locations that you should add the new health facilities. And these are the performance in terms of district-wide prevalence, incidence, and travel time, based on the three new health facilities that are optimized. Okay, so this sort of give people an idea, well, if you were to minimize some of these metrics that you want, where are the locations that you want to add? Furthermore, this also actually give a little bit of idea of how much you can actually achieve by just utilizing these correlations that we have. Which is to say actually is not a lot, right? Reductions of 0.6% prevalence compared to a baseline prevalence of 40%. Although in terms of travel time, you do reduce by quite substantially, and that is also a good idea because you want to bring healthcare to the people, so that people did not need to travel that much more time to the health facilities. So it's not essentially a bad thing as well. But the idea is that if you were to fight malaria by building new health facilities, you really want to think that the return is probably not as much as what a blanker statement of, travel time to how facilities is significant, is a strong predictor of malaria prevalence. But when it comes to actually using pen and paper, and draw things out, in this case mouse and keyboard, you do realize that the effect is not as strong as you would think. So that is very much the app that I'm showcasing here. But the idea is that when I designed this app from the very beginning to what you see here, a lot of consideration is on the speed of calculations, whether or not my apps are responsive enough. And I come to the point where I think that, hey, you know what, my app has been doing a pretty good job, I must say, in doing all these predictions on top of letting people interact with some of these elements, and I can keep adding new health facilities, update predictions, and these usually take less than a second to get new visualizations. There's very much what's in my mind when I design a relatively complicated, elaborate kind of app, it's all about the usability of my apps. And I really want to make sure that my app can process things quickly so that people won't get bored or people won't get turned off by it. And so for the next like maybe half an hour or so I'm gonna walk you through some of the considerations that I bring when it comes to creating a Shiny app that I feel like it's responsive enough for the user, such that the user experience is there. Again, there are a lot of thoughts behind user experience as a whole, and usability is definitely only one part of user experience. And by no means I'm not an expert on user experience, but this is something that I learned by developing Shiny apps. Before I go into what's kind of promised in my title that is improving speed of Shiny apps by pre-computing models, I really want to give a lot of shout-outs on coding efficiently. Often time we think that, oh, you know, we are not professional R coder, we are not like computing scientists, let's not think about optimizing our code or efficiency in coding. But for me, large part of my career, large part of my PhD research has been about writing code and using code to achieve something. And I feel like for many of us in the same boat it really is something that we should be investing our time to improve our code efficiency. This is a great book, I've gone through quite a large part of it. There are really lots and lots of good advice out of the book. You can check it out, click the link. I think there are all sorts of efficient way from start to finish. But for this presentation I want to focus on a number of points that I find myself very, very useful for myself. And I hope to you as well. The good old story about vectorization over loop. If you find yourself doing some relatively simple task that you know is done using for-loop, sometimes you really want to question yourself, is there a vectorization equivalent of the methods that you're using. And oftentimes there are a lot of native functions or someone has written functions out there that are already very optimized and already utilizing vectorization methods. And sometimes we want to leverage on some of these functions out there. There's a point about, so if you look at your RStudio, there's a button called profile. I'm not sure if you have ever used that, but I've been using profilers lots and lots of time in my journey of not just writing R code, but also writing Shiny apps to really gain insight about which part of my code is taking a very long time to execute. I'm also gonna briefly demonstrate it later to you. And finally, as much as sometimes we want to stick to using base R package, Tidyverse and data.table package can be really, really important and useful for our R coding in terms of saving time as well. And I'm also gonna demonstrate it to you in the three demo step in the following slides. So I don't really have good examples on like how exactly to make good vectorizations and whatnot, but I'm gonna tell you a story of questions that always come up in my mind. I've used R for quite a number of years, but one thing that I still couldn't find out until last year is how exactly to calculate moving average in a very elegant way. You can actually download the R script here. I'm gonna jump over to the R script to tell this part of the story. Demo1.r. Okay, so for this script I rely on deployer and I also rely on this package called Microbenchmark. It is a very helpful package to really learn about execution speed of my code. Okay, especially when the code is running too quickly for you to capture the running time, sometimes. Okay, so I start out with generating some random time series. So set some seed and use sort of like a AR model to generate a random time series. And you see a random time series here, imagine from zero to thousand days you're measuring some values on the Y axis that goes starting from zero all the way up to maybe a little bit more than a thousand. As I say, for the longest time, I haven't known what are like the best way to calculate my moving average. And it turns out that I have to create functions and write a for-loop to calculate moving average like this. Okay, and then last year, when I was involved in a lot of COVID-19 kind of visualizations, that's when I really take out from the Google Verse, from the internet and find out that there's actually a pretty obscure sounding function, it's called filter that comes with the base R package, and because it always interfere with a deep fire package, which also have filter functions. So I would add a prefix here stats, double colon. It's not very intuitive, but this is how exactly you calculate a seven day moving average, for example. So in this Microbenchmark functions I compare, the first method I put in the first place with the second method I put in a second place here to calculate a moving average. So I run Microbenchmark, obviously I haven't run everything. Okay, yeah. And, I think if there's a way for me to, okay, yeah. Enlarge. That's it, improve my, okay. Okay, so again, you want to use Microbenchmark, especially because your code execution speed is very fast, but in this case you can see that the first method using a for-loop, the meantime of executions is about 14 milliseconds. Milliseconds is 10 to the power of negative three. So it's like 0.014 seconds compared to the so-called native way of calculating moving average. That takes almost only one in 100 or one 150th time of the time that we use for when we were to use a for-loop. So the idea here is that the function utilizes vectorizations behind the scene. And it also pass a lot of these are calculations to the foundations of R, which is actually C language. Okay, it passes a lot of the calculations to C language and lets C do the job, and C does this kind of job much, much, much better compared to R. And so the package does all this for you, oh no, sorry, not the package, the function does all these things for you. And therefore you can see a much better calculation speed compared to using like a handwritten for-loop way of doing things. And just to make sure that I'm doing it right, it seems like I'm calculating my moving average right in this case. Okay, so that is just a little bit of, for a lot of us I think we are all aware of like the power of vectorizations and native functions compared to using for-loop, and if that's the case, it's a good refresher to remind yourself using a for-loop sometimes can take like a lot more time compared to using some of these native ways of doing things. Now, the second demo is a lot more closer to what you see in my Shiny app just now. Okay, sometimes we rely on functions that's developed by someone else, that does not support vectorizations. And this is very unfortunate in my case. So remember that I show you the app that I have just now, there's an incidence, right? So we actually don't collect any data on incidence. So this survey done back in 2010 to 2013, there was no incidence data. And in fact, incidence data is notoriously difficult to collect because we need to go to the health facilities, and they don't always make recalls until recent years, for example. So we rely on Malaria Atlas Project, which is a group that was based in University of Oxford. Now, I think is in University of Western Australia. They were the one who make some sort of statistical model to correlate the relationship between malaria prevalence and incidence. Okay, prevalence and incidence relationship is not linear. And as a result, we need some sort of conversion equations and formula that is sort of like a, I think it's a polynomial to convert the relationship between malaria prevalence to malaria incidence. But before I can even do that, I need to do another conversion. In this case malaria parasite prevalence that we collected on the ground is actually from children of zero to five years old. But the equations that the Malaria Atlas Project conduct with was actually from two years old to nine years old. They convert malaria prevalence of two years old to nine years old to incidence. So we need to first convert our malaria parasite prevalence from zero to five years old to two to nine years old. And thankfully again, from the same team back in 2007 they actually created a package to do this job for us. And I'm gonna move over to the R script. And you can actually download the R scripts. I think Jodi has shared the link in the chat as well. Okay so this is the second demo and R sripts. Okay, so first of all, I rely on the package and R over the years has helped us install package out of GitHub. And this is how you would install the package through the teams at GitHub. And then I have already installed the package, so I can just run library each stem. So in these demonstrations, and it's very much what happened in my app as well, imagine that I have a hundred pixels times hundred pixels area, each of these pixel has a prevalence value. So in total I have 10,000 prevalence values that I need to convert first from zero to five years old to two to nine years old. And then after that, from that stat I can convert it to incidence. Okay, so let's create some random numbers. R from zero to one, right, prevalence is a number that is from zero to one. So we create 10,000 values here. And so here I'm gonna run a profiler just to let you have a taste, if you haven't seen what a profiler looks like. I'm gonna choose the first way and the second way that I'm comparing, and I put into my profiler. What I do is I can click profile down, and you can see that there's profile selected lines here, and I'll click it. It's gonna take a while and here is what I'm gonna explain to you, what are the both ways. So the first way is to put in that 10,000 values directly into the convert prevalence function that is provided to us by each stem package. Okay, and here I need to specify that the values that I put in is actually prevalence of zero to five years old. Okay, the results of the profile one has come up, but I'm just gonna explain through the code first. Okay, for the first way I put the 10,000 value into the function directly. But I know that if I were to run in my console convert prevalence, I know that this function that is created by this Malaria Atlas Project group, they rely heavily on for-loop, okay? So they would make the calculations one by one, first value, second value, third value all the way down to 10,000 value. And this is gonna take a lot of time for me. Okay, and we can look at the profiler (indistinct) to see how long it takes. So I'm like, no, I'm not gonna let this happen, I'm not gonna take 10 seconds, to let my user wait for 10 seconds just for this conversion to be done, okay? Here's another way that I propose, because the prevalence to incidence conversions is actually one-to-one and monotonically increasing, I use this to my advantage. So instead of running 10,000 value, I say convert prevalence value of 1001. And this 1001 value is very specifically zero, 0.001, 0,002, 0.003, all the way down to 0.999 and one. Okay, so this is a sequence of values. So once I calculated this value, and it's called pre-calc, I can use quantile functions, and I can put in my R, the vectors, and I can pull values based on these pre-cal vector. This is all thanks to the fact that the conversion is actually monotonically increasing. Okay, and because of that, I only need to put in 1001 value into this convert prevalence function. And you can see the result in the profiler. So the profiler is great because profiler let us know line by line what the time taken to execute this particular line, okay? The graphic down here usually don't make a whole lot of sense, but the graphic up here is very, very useful. So what the profiler tell me here is that for my first way of executions it takes 17,000 milliseconds, which means 17 seconds, okay, to fully convert the 10,000 value. But for my second way, it takes 1700 milliseconds, which means 1.7 seconds to run this first line of code, and then to run this second line of code using quantile it takes 10 milliseconds. So in total, I achieve the same task only in one 10th of the time if I were to just blindly use the convert prevalence function that is provided to me by this group back in Oxford. Okay. And in fact, in the profiler you can actually also look into data panel and you can actually look specifically into convert prevalence function, which are the stat that is actually taking a lot of your time. For example, invert PF. Sometimes there are more information here if you know what's going on with the invert PF sections of the function. And these particular sections really helped me a lot in my journey in terms of coding R. Okay, so just to close this session out, I want to just show you that my conversions offer both way, they were actually more or less exactly identical, just so that you know, I'm not cheating you by using my second way here. Okay, this is very much relevant to what I'm doing in my Shiny apps. The third demonstration is also kind of relevant to a lot of my other Shiny apps that I don't get to present to you today. It's the idea that sometimes we rely on some sort of external source to read in large tables, big CSV table. Okay, and oftentimes when these kinds of things happen, so for example, your Shiny app at the backend, you're reading in hundreds of thousands of rows of CSV, and then you do some data manipulations before you turn it into visualizations. Then very much you want to use packages like Tidyverse and data.table to do these kind of manipulations. And the R scripts that's supporting this demonstration is here, and I'm gonna go over to the code in my RStudio. So in this demonstration I'm gonna show you sort of the comparison of the Tidyverse packages. So Tidyverse packages include deployer that we know of, TidyR, reader is not a package, that comes with Tidyverse. Reader, obviously, from the name you can see that it's how you use it to read a certain data table, for example, CSV. Data.table package is a sort of like a MVP package when it comes to big data kind of analysis, and then Microbenchmark. Okay, so the first step of my sort of demonstration is that I need to be able to generate fake data frame. In this case, I specify the number of columns and the number of rows that I want to generate. And then it's gonna generate a fake data frame for me that is full of numbers. I'm gonna create a result table, in this case I'm gonna run it first because it's gonna take some time before I explain what's going on this chunk of code. Okay, in this chunk of code, basically, I ask the three functions in base R that is the read.csv, in reader package there's a read_csv, and in data.table I'm using the function called Fread. So the three functions in the three packages, I asked them to read, first of all, CSV that has 10 rows of data. And then increasingly I ask them to read 100 rows of data, and then 1000 rows of data, and then 10,000 rows of data, and then a hundred thousand rows of data. And in each of these step, I recall the number of time, the amount of time that is used to execute this function. So system.time allows me to do that, and I recall them into my empty data frame there. Okay, great, I'm done that, just in time for me to plot the results. And we can also look at the table here. So you can see from the plot here, by 10 to the power four, which means a thousand row, wait, 10,000 rows of CSV, you can see that the performance of base R package start to be really not good. And then by a hundred thousand rows you see that the time taken to read the CSV went up to about 50 seconds. Your mileage may vary because I'm using an eight years old computer right now. It does seem to take a lot more, if you will, compared to using a much faster computer. For reader package, you can see that up to 10,000 rows the performance is still not bad, but when it comes to a hundred thousand rows, the performance tends to deteriorate. But when it comes to data.table function, that is Fread, you can see that the performance went up to 10 to the power of five rows is still a pretty good, it takes about a 0.34 seconds to load that amount of CSV into our system. But really sometimes if we didn't need to rely on some sort of external sources of data, CSV file, for example, we can actually store our data frame, big tables, for example, into this format called RDS. So RDS is sort of like binary files that is specific to R, that is used to store R objects. Okay, and you can use an R write RDS function to write the data frame into this file called, for example, temp.RDS. Okay, if you sort of calculate the number of times that it takes to read RDS, you can see that it takes only about 0,11 seconds to read this RDS format. So whenever it's possible, you should also try to use the advantage of RDS files. Okay, just a quick demonstrations of filtering. When it comes to filtering, I am comparing four types of filtering. The first one is base R filtering. The second one is use the deployer filter functions. The third one is I convert my data frame to a data table object first before I use the data table way of filtering. And the fourth one is I use the data table object that comes with the data table package, and to do the filtering. It's gonna take a while as well, which means I get to have a breather before we talk about it. But you can also see here the performance of the packages compared to the base R package. In this case, you don't get that much of a performance improvement compared to the read CSV, for example, but you're still looking at, for example, meantime of running deployer filter functions, or the meantime of using a data table pack way of filtering. It's still about kind of 1/3rd to 1/4th of what you would be using comparing to a base R package. So if your Shiny apps is using a lot of these kind of manipulations behind the scenes, you really, really want to consider using Tidyverse or data.table to your advantage. Okay, so these are the three points in terms of efficient coding that I feel very strongly about, that I have been applying to a lot of my Shiny apps that I've created. I hope these small demonstrations seem useful for you when it comes to coding efficiently in your Shiny app or just general R coding. The next part is very much related to the second demo that I showed you just now. We precalculate things, we store, in this case, so when you store these particular pre_calc, for example, you can keep on using it over and over again until my user is tired of using my Shiny app, for example, right? So you do one time calculations and you can keep on using them. And that is the philosophy in a lot of our Shiny app development. It's just extremely helpful for us to write out our workflow and then analyze which part of our workflow can actually be stored away or can actually be pre-calculated if it's taking too much of our time. So just to look back at the health facilities app that I showed you just now. Here, for example, for the first functionalities, here are workflows that I sort of in my mental model, sort of I created this based on what I think I want to add onto my app. So first of all, I load the data and then I fit the GAM model. Both of these are really, really fast. And then I ask user for new coordinates, and then I calculate new travel time based on new coordinates. This is the part where I'm not too sure because it could be that it's gonna take quite a little bit of my time. But you will see that it turns out that, you know, it turns out that the calculations was actually pretty fast. So I'm kind of lucky in the way that these calculations is not very long. And then finally I produce new predictions using the GAM. So it turns out that GAM, GLM or some of the linear models, when you use the predict functions that they provide, this actually is just a bunch of a matrix computations behind it, and so these functions are generally very, very fast, even when you have tens of thousands of predictions that you wanna do. So in my Shiny app, I didn't think that loading data and fitting GAM model makes sense in my Shiny app. So what I did was I stored the GAM model away, and then when I need it, I will pop the pre-fitted GAM into my Shiny app and use it for predictions in the later part of the workflow. And this is sort of like the workflow that I decide for my Shiny app. Just to show you how you can store your models just in case you're not sure how to do it. For example, I use NGCV to fit my GAM model, and this is just creating a bunch of fit data, and then I fit the model using GAM. And in this GAM I have two dimensional splines here. And my GAM model is fitted, and then what I did was I use a safe function, and I wanna save this object called mod, and save it into a file called gam_mod.RDA. I'm not sure what's the significance of dot RDA, the extension itself, I'm pretty sure you can use other random extension. But when I learned these functions, someone else just uses dot RDA, which seems to be standing dot R data. And that's what I use. I save it away, so now in my environment there is a gam_mod.RDA file. And then when I need it, I can use the load function and I load this particular file. And then when I run mod, you can see that the mod is now in my system. So this is a great way of storing your model away, and then pop it back up when you need it. Okay, we also have the optimization function, I have this in the health facility app. Again, this is sort of the workflow of my optimizations functionalities, and there's particular steps that takes forever, right? Special optimized. We use sort of like genetic algorithms to do our special optimization to find out which is the best spot to place these new health facilities. This takes an extremely long time. And so what I did was I realized that I can pre-optimize and store these coordinates into a CSV file. So when I need it I would just pull these pre-optimized coordinates and show it to my users. So this obviously turn 30 minutes fitting time to zero seconds, literally. And if you want to have look, this is kind of like how I organize that CSV file. So this is the number of health facilities. When my users say they want three health facilities, these are the three longitude and latitude that I'm gonna display to my user. And then with the metrics as well. You can choose different metrics and they will have different corresponding longitude and latitude to display. So in some way I'm turning my Shiny app from a big calculator to more or less, not so big calculator and more like a visualizer here. And I think this is extremely useful for many of us who often have to use things like optimizations, or sometimes certain things that we know that it's sort of gonna be staying stagnant, this is a very helpful way of designing our Shiny app around that. I'm not sure, are we very, very on time here or? Would you spare a few more minutes? - I think, go ahead. If people need to leave, feel free to leave, but we're recording, and we can make it available for people that need to leave, but keep going. - Cool, yeah. I'm just gonna address a little bit about the issues. Sometimes, though, I say that like predictions from GAM is very fast, so we can just embed the model into our Shiny apps, but sometimes, though, our model can be very complicated and that predictions out of the model directly isn't that simple and fast. And this is a work that I've done with another group in University of Florida. They came up with a very, very sophisticated and complicated model to model the distributions of aedes aegypti and aedes albopictus, these are two types of mosquitoes that are known to transmit aedes, Chikungunya, and I think Zika virus as well. And they use a bunch of environmental covariates, and they also come up with a zero inflated negative binomial model with site and county level kind of random effects. So a very complicated model, and more so that they actually rely on a package called a glmmTMB to fit this model. And the problem is that fitting the model takes hours. And then the worst part is if we were to use the predict function that comes out from glmmTMB package, there were a lot of calculations behind this to make extremely accurate model, accurate kind of predictions, when it comes to uncertainty as well, and this takes extremely long time. So in that case, this is really tricky for us, but in this case, sometimes you want to realize that model is essentially a bunch of estimated parameters, and sometimes what we really want to do is to be able to dig deep into the model, and to pull the parameters out from the glmm object, for example, and calculate the predictions ourself. This requires a little bit of understanding of the statistics behind, but I think it's extremely useful in this case because we are turning one hour prediction time to, again, essentially zero seconds or one second kind of calculations. This app is still under development. So I'm just gonna be able to show you briefly what has been done to let you have a feel about what happens when you pull these estimated parameters out and to actually use it for calculations, and calculate your predictions yourself, instead of relying on the package in-built kind of function. This turns out to take a long time. Let's hope that it doesn't. Well, since someone somewhere is not very happy with me. Okay, Shiny apps is not happy with me, I guess I will not be able to show you, but the idea is the same that we have some sort of a control panel that allows people to change the county, change the environmental covariates, and they get to see the predictions almost immediately. And that would be some sort, this I think is the kind of workflow that I very much recommend for people who are, for example, using a very complicated model, for example, sometimes we fit it some sort of like multilevel hierarchical models, and it often involve us storing all these like posterior samples of the estimator parameters, and when it comes to the Shiny apps, we use these pre-stored, pre-fitted kind of posterior samples. and just make multiplications, matrix computations behind the scene. When it comes to matrix computations, these calculations are very fast, and generally don't take us a whole lot of time. So it was very much what I have to show you about today's topic. And I would close out my presentation with some final random tips that are very close to me. I have been doing a lot of geospatial kind of visualizations, and I find that SF is generally faster than SP when it comes to shape files and whatnot. But sometimes we still rely on SP because it works better together with the rasters. High quality shape files can sometimes take much, much longer time to render plots and leaflets. I didn't get to show you the app that I wanted to show you just now, but that would be an example where Florida shape files turns out to be too detailed. And I had to downgrade my shape files using, for example gSimplify functions from the Rgeos package. And finally, finally, this gave me a heart attack a few hours before this presentation, something wrong with my code in my Shiny apps that I show you the health facilities Shiny apps. It turns out that it's because filter function, for example, select functions, there are multiple package that has the same functions, and sometimes R got no idea which functions are you referring to. So do remember to sometimes shot-proof your code at prefix of deployer, for example, in front of your filter and in front of select functions. That's very much for the presentation. Thank you very much. And, yeah. Here are some of my contact details. Feel free to give me questions, comments, suggestions, and if there's any questions, I would like to hear it. - All right, thanks Ben for that presentation. So there's at least a few questions so far, the first one being, well, it's more of a comment that there's also the role apply function that can also be used to calculate moving averages. - Yeah. - The first question is, is there a way to use the profiling tool to measure speed of functions in Shiny? So for example, like quantifying reactive changes to a map, or a graph or, something like. - I'm not aware of using a profiler directly with Shiny apps. So I think it was last week's presentation. There are ways for people to test their Shiny apps based on a certain preset input. And I think that would be the best way for you to profile what happened if they put this input, and one at a time take to profile those codes. Yeah, I'm not aware of any great way to directly using a profiler with Shiny apps. - Okay, and another question asks, what's the difference between a dot R data file and the dot RDS or RDA file? - Yeah, that is a very good question. I definitely asked that myself before, but I didn't quite find out my answer. My gut feeling is that RDS and RDA, they both work the same way. So you most likely can write the model into RDS without problems. Yeah, so that is as much as I can say, but, yeah. That's as much as I can say about it. - Okay, and another question asks if you know if there's a speed difference between plotting methods for a GG plot or base R, other different plotting methods. - Not so much. I feel like I haven't seen a particular difference between GG plot and base R plot. But I do want to say that there are certain things that you do that can take longer to plot in one plot compared to another plot. For example, there are certain small things, like the choice of colors or the choice of the size of the dots. Those actually make a difference. - Okay, well, I think those are all the questions. There's one more. One person posted a link that there's a profiler for (indistinct) that works with Shiny, although it hasn't really been tested that much, or they haven't tested it much. - Oh, great, yeah, that's great to know. Okay. Yeah, okay. (keyboard clacking drowns out speaker) Shiny. Oh, yeah, profiling in Shiny. Oh, that's great, yeah. Yeah, that could be one way to, yeah, well. Yeah, right, so you actually have like the run examples that you can (indistinct) through it, yeah, that's right. So I think it was last week's presentations or two weeks ago that, yeah, you can use one example to sort of test on your Shiny app, and then you can add a sort of like a wrapper out of the (indistinct), yeah. Thank you so much for the comments.
Info
Channel: Ecological Forecasting
Views: 34
Rating: undefined out of 5
Keywords:
Id: O_R42SWrJ34
Channel Id: undefined
Length: 64min 27sec (3867 seconds)
Published: Wed Sep 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.