- Everyone can see the screen now. Yeah, thank you very much for having me. So to share with you a
little bit of my experience through the journey of
creating Shiny apps. First of all I must emphasize that like I'm by no means an expert or
professional in creating apps, let alone Shiny apps, but there are a lot of
things that I learned through creating some of these apps, and it is a great honor to
have a platform like this, and to talk about some
of these experiences. Okay, so a little bit of
a sort of housekeeping. You can actually access the
slides through this link. So bit.ly/Shiny831, and there are some R scripts
that I'm gonna be using for these presentations. And once you can access the slides, you can actually download
from there as well. Okay. So, well, yeah, I should
go back to the title for a little bit. So what the title suggests, today's presentation is
gonna be focusing a lot more about sort of the usability
of our Shiny apps. When it comes to creating an app, we really want to make sure
that our app is responsive, is fast enough so that
people don't, you know, like click a button and
wait for, I don't know, 10 seconds for another screen to comes up or something like that. And oftentimes this involves
a good design of, you know, your workflow behind what's going on in your server side of your Shiny apps. But it also involves a lot
of efficient coding as well. So that is kind of like what
I'm gonna walk you through, some of my experience in
creating some of these things. But first I will demonstrate
one of the Shiny apps, sort of quite elaborate
ones that I have done as part of my PhD research. And this is specifically on looking at the relationship
between travel time for health facilities and malaria. So malaria is still a devastating disease affecting hundreds of millions
of people around the world. A lot of us might not have
heard much or worried much about malaria, but for people
in the Sub-Saharan Africa, for example, this are
where malaria is still affecting the economies
and livelihoods of people. For this project I worked
with my PhD advisor, Dennis Volley, and my
former lab mate Justin Mila, and also our Ghanaian collaborators. Focusing on a district. There is Northern Northern side of Ghana. It's called Bunkpurugu-Yunyoo district, pretty small district up
here at the border of Togo. And what you're looking at right here is a result of a survey
done back in 2010 to 2013. You're looking at the malaria prevalence of the district here. Okay, is there a way to get rid of this? Okay, come down here, okay. What you're looking at is the malaria prevalence here. Specifically, the survey was focusing on the children under five. This is the age group that
is most affected by malaria compared to like the
older children and adults. So as we can see from the map here, even within the district itself, you can see that the Southern regions of the district is, you experience very high malaria
prevalence up to like 80% or 90% down here. And then in the Northern area you get to a much lower prevalence, like 10%, for example, 20%. There's two open center in this district, it's called Napanduri here and Bunkpurugu. So based on the data that we collected and based on the statistical
analysis that we've done, it comes up that one of
the very strong predictor of malaria prevalence
here is actually distance or travel time to health facilities. When we say this statement out loud, it really makes sense that,
well, if that's the case, why not we build a bunch
more health facilities all over the district, and this would probably help to bring down the malaria prevalence. And there is a little bit
of causal relationship there because we know that early treatment and diagnosis of malaria can actually stop the malaria transmission as well. So that is the whole idea of this app. We want to let people
to explore what happens if you add more health
facilities onto the map, what does that mean to malaria prevalence or malaria incidence? So as you can see from the metric here, you can choose to see malaria incidence, and some malaria prevalence,
or what does it mean in terms of travel time to
the nearest health facilities? The whole idea of the app
is that when people interact with the map and try to
add more health facilities, they get the idea that
what really are the numbers that our data tell us, right? And whether or not it makes sense. And if it makes sense to us, where is the best place should we place our new health facilities. Okay, so that is the whole idea. And what you're seeing here, there are like eight health facilities that exist in the district. Five of them are in blue colors. They are the health centers,
and three of them are in red, they are the sort of community-based kind of like mobile health post. Okay, so lots and lots of
things that you can look at. Again, if you look at travel
time, for example, of the map, you can see that in the
Southern regions of the district the travel time to
nearest health facilities tends to be large. So people have less access to healthcare in the Southern regions of
the district, for example. Okay, so, what we ask the user to do is, well, you can choose where you want to add a new health facility. Say, for example, I look
at this map, I say that, well, this region's people
have less access to healthcare, let's add a health facility here. So click on that, there's
a red dot up appear here, and I can say add facilities here. Okay, once I add the facilities, I say, update the
predictions and let me see what does it mean after I add
these health facilities here. Okay, once I added it,
perhaps it's easier to look at the difference from the baseline because we can now see what has changed. So right now we are
looking at travel time. You can see that by adding
a health facility here we improved the travel
time in the close vicinity of this health facility,
it all makes sense, right? But you can see that the
impact is kind of localized. The predicted impact, I must
add, is kind of localized here. And you can also look at prevalence, again, the changes is
pretty much localized to what you see here. Okay, so, again, incidence is even
lower, it seems like. Okay, and then over here, this would show the
district-wide prevalence, district-wide incidence and district-wide travel time per person. And you can see that the reductions is actually very, very small for adding the health facilities here. So it kind of is strange, right? Apparently I have improved, I
have added health facilities onto a spot where the
travel time was high, so what happened here? That's because there are
a lot of interactions with the underlying populations. So there were simply more people living in the Northern area of the district compared to the Southern
area of the district. So when you improve the travel time, you improve the travel time for maybe like a small
portion of the people. And as a result, you
don't see a big impact when it comes to
district-wide kind of metrics. So let's say now that I
have this mental model, I go back and say, okay, let's choose a spot
that is near the roads. It's not very clear
here, but it's kind of, you can see that there are three roads leading to this particular
spot called Yunyoo. It's sort of a very small
town, rural town here. Let's add a health facilities here. Okay, so once I added, again,
I click update predictions. Not too obvious, but I can
look at the difference, and I can see that the improvements
in terms of prevalence, it's a whole lot more, like you can see that the reach of these
new health facilities is a lot more compared
to the spot just now. And you can also see that travel time is also improved just by marginally for a lot of regions around this area. Basically you see that the
impact area is slightly larger. When it comes to
district-wide prep metric, again, you can see that the numbers, the reductions from the baseline
tends to be larger as well just simply because we choose a spot where the populations is higher. Okay, that is very much sort
of like a first functionalities of the app, you get to
choose and pick a spot, and choose and see what it means in terms of your actions of
adding new health facilities. These are all based on the data that is generated back in 2010 to 2013. There is an underlying model. We are using a generalized additive model with distance and some other covariates driving the predictions that you see. Okay, so let's say you're done adding new health facilities and playing around with
this part of the app. You can also ask the app to
optimize the locations for you. Choose the criteria that you want, for example, in this case, I want to reduce the
district-wide prevalence of children under five
as much as possible. And say, for example, I want to add three new health facilities. I can click the optimize button here, and it will tell me in purple these are the three
locations that you should add the new health facilities. And these are the performance in terms of district-wide prevalence, incidence, and travel time, based on the three new health
facilities that are optimized. Okay, so this sort of give people an idea, well, if you were to minimize
some of these metrics that you want, where are the
locations that you want to add? Furthermore, this also actually
give a little bit of idea of how much you can actually achieve by just utilizing these
correlations that we have. Which is to say actually
is not a lot, right? Reductions of 0.6% prevalence compared to a baseline prevalence of 40%. Although in terms of travel time, you do reduce by quite substantially, and that is also a good idea because you want to bring
healthcare to the people, so that people did not need
to travel that much more time to the health facilities. So it's not essentially
a bad thing as well. But the idea is that if
you were to fight malaria by building new health facilities, you really want to think that the return is probably not as much as
what a blanker statement of, travel time to how
facilities is significant, is a strong predictor
of malaria prevalence. But when it comes to
actually using pen and paper, and draw things out, in this
case mouse and keyboard, you do realize that the
effect is not as strong as you would think. So that is very much the app
that I'm showcasing here. But the idea is that
when I designed this app from the very beginning
to what you see here, a lot of consideration is on
the speed of calculations, whether or not my apps
are responsive enough. And I come to the point
where I think that, hey, you know what, my app has
been doing a pretty good job, I must say, in doing all these predictions on top of letting people interact with some of these elements, and I can keep adding
new health facilities, update predictions, and these usually take less than a second to
get new visualizations. There's very much what's in my mind when I design a relatively complicated, elaborate kind of app, it's all about the usability of my apps. And I really want to make sure that my app can process things quickly so
that people won't get bored or people won't get turned off by it. And so for the next like
maybe half an hour or so I'm gonna walk you through
some of the considerations that I bring when it comes
to creating a Shiny app that I feel like it's
responsive enough for the user, such that the user experience is there. Again, there are a lot of
thoughts behind user experience as a whole, and usability
is definitely only one part of user experience. And by no means I'm not an
expert on user experience, but this is something that I learned by developing Shiny apps. Before I go into what's
kind of promised in my title that is improving speed of Shiny apps by pre-computing models, I really want to give a lot of shout-outs on coding efficiently. Often time we think that, oh, you know, we are not professional R coder, we are not like computing scientists, let's not think about optimizing our code or efficiency in coding. But for me, large part of my career, large part of my PhD
research has been about writing code and using
code to achieve something. And I feel like for many
of us in the same boat it really is something that we
should be investing our time to improve our code efficiency. This is a great book, I've gone through quite
a large part of it. There are really lots
and lots of good advice out of the book. You can check it out, click the link. I think there are all
sorts of efficient way from start to finish. But for this presentation I want to focus on a number of points that I find myself very, very useful for myself. And I hope to you as well. The good old story about
vectorization over loop. If you find yourself doing
some relatively simple task that you know is done using for-loop, sometimes you really want
to question yourself, is there a vectorization equivalent of the methods that you're using. And oftentimes there are
a lot of native functions or someone has written functions out there that are already very optimized and already utilizing
vectorization methods. And sometimes we want to leverage on some of these functions out there. There's a point about, so
if you look at your RStudio, there's a button called profile. I'm not sure if you have ever used that, but I've been using profilers
lots and lots of time in my journey of not just writing R code, but also writing Shiny
apps to really gain insight about which part of my code
is taking a very long time to execute. I'm also gonna briefly
demonstrate it later to you. And finally, as much as
sometimes we want to stick to using base R package,
Tidyverse and data.table package can be really, really important and useful for our R coding in terms
of saving time as well. And I'm also gonna demonstrate it to you in the three demo step
in the following slides. So I don't really have good examples on like how exactly to make
good vectorizations and whatnot, but I'm gonna tell you a story of questions that always
come up in my mind. I've used R for quite a number of years, but one thing that I still
couldn't find out until last year is how exactly to calculate moving average in a very elegant way. You can actually download
the R script here. I'm gonna jump over to the R script to tell this part of the story. Demo1.r. Okay, so for this script
I rely on deployer and I also rely on this
package called Microbenchmark. It is a very helpful
package to really learn about execution speed of my code. Okay, especially when the
code is running too quickly for you to capture the
running time, sometimes. Okay, so I start out with generating some random time series. So set some seed and use
sort of like a AR model to generate a random time series. And you see a random time series here, imagine from zero to thousand days you're measuring some values
on the Y axis that goes starting from zero all the way up to maybe a little bit more than a thousand. As I say, for the longest
time, I haven't known what are like the best way to
calculate my moving average. And it turns out that I
have to create functions and write a for-loop to calculate
moving average like this. Okay, and then last
year, when I was involved in a lot of COVID-19
kind of visualizations, that's when I really take
out from the Google Verse, from the internet and find out that there's actually a pretty
obscure sounding function, it's called filter that comes
with the base R package, and because it always interfere
with a deep fire package, which also have filter functions. So I would add a prefix
here stats, double colon. It's not very intuitive, but this is how exactly you calculate a seven day moving average, for example. So in this Microbenchmark
functions I compare, the first method I put in the first place with the second method I
put in a second place here to calculate a moving average. So I run Microbenchmark, obviously
I haven't run everything. Okay, yeah. And, I think if there's a way
for me to, okay, yeah. Enlarge. That's it, improve my, okay. Okay, so again, you want
to use Microbenchmark, especially because your
code execution speed is very fast, but in this case you can see that the first method using a for-loop, the meantime of executions
is about 14 milliseconds. Milliseconds is 10 to the
power of negative three. So it's like 0.014 seconds compared to the so-called native way of calculating moving average. That takes almost only one
in 100 or one 150th time of the time that we use for
when we were to use a for-loop. So the idea here is that the function utilizes vectorizations behind the scene. And it also pass a lot
of these are calculations to the foundations of R,
which is actually C language. Okay, it passes a lot of the
calculations to C language and lets C do the job, and
C does this kind of job much, much, much better compared to R. And so the package does all this for you, oh no, sorry, not the package, the function does all
these things for you. And therefore you can see a
much better calculation speed compared to using like a handwritten for-loop way of doing things. And just to make sure
that I'm doing it right, it seems like I'm calculating my moving average right in this case. Okay, so that is just a little bit of, for a lot of us I think we are all aware of like the power of
vectorizations and native functions compared to using for-loop, and if that's the case,
it's a good refresher to remind yourself using
a for-loop sometimes can take like a lot more time compared to using some
of these native ways of doing things. Now, the second demo is a lot more closer to what you see in my Shiny app just now. Okay, sometimes we rely on functions that's developed by someone else, that does not support vectorizations. And this is very unfortunate in my case. So remember that I show you the app that I have just now,
there's an incidence, right? So we actually don't collect
any data on incidence. So this survey done back in 2010 to 2013, there was no incidence data. And in fact, incidence data
is notoriously difficult to collect because we need to
go to the health facilities, and they don't always make recalls until recent years, for example. So we rely on Malaria Atlas Project, which is a group that was
based in University of Oxford. Now, I think is in University
of Western Australia. They were the one who make
some sort of statistical model to correlate the relationship
between malaria prevalence and incidence. Okay, prevalence and incidence
relationship is not linear. And as a result, we need some sort of conversion
equations and formula that is sort of like a,
I think it's a polynomial to convert the relationship
between malaria prevalence to malaria incidence. But before I can even do that, I need to do another conversion. In this case malaria parasite prevalence that we collected on the ground is actually from children
of zero to five years old. But the equations that the Malaria Atlas
Project conduct with was actually from two years
old to nine years old. They convert malaria
prevalence of two years old to nine years old to incidence. So we need to first convert
our malaria parasite prevalence from zero to five years old
to two to nine years old. And thankfully again, from
the same team back in 2007 they actually created a
package to do this job for us. And I'm gonna move over to the R script. And you can actually
download the R scripts. I think Jodi has shared the
link in the chat as well. Okay so this is the
second demo and R sripts. Okay, so first of all,
I rely on the package and R over the years has helped us install package out of GitHub. And this is how you
would install the package through the teams at GitHub. And then I have already
installed the package, so I can just run library each stem. So in these demonstrations, and it's very much what
happened in my app as well, imagine that I have a hundred pixels times hundred pixels area, each of these pixel
has a prevalence value. So in total I have
10,000 prevalence values that I need to convert first
from zero to five years old to two to nine years old. And then after that, from that stat I can convert it to incidence. Okay, so let's create some random numbers. R from zero to one, right, prevalence is a number
that is from zero to one. So we create 10,000 values here. And so here I'm gonna run a profiler just to let you have a taste, if you haven't seen what
a profiler looks like. I'm gonna choose the first
way and the second way that I'm comparing, and
I put into my profiler. What I do is I can click profile down, and you can see that there's
profile selected lines here, and I'll click it. It's gonna take a while and here is what I'm gonna explain to you, what are the both ways. So the first way is to
put in that 10,000 values directly into the convert
prevalence function that is provided to us
by each stem package. Okay, and here I need to
specify that the values that I put in is actually prevalence of zero to five years old. Okay, the results of the
profile one has come up, but I'm just gonna explain
through the code first. Okay, for the first way
I put the 10,000 value into the function directly. But I know that if I
were to run in my console convert prevalence, I
know that this function that is created by this
Malaria Atlas Project group, they rely heavily on for-loop, okay? So they would make the
calculations one by one, first value, second value, third value all the way down to 10,000 value. And this is gonna take
a lot of time for me. Okay, and we can look at
the profiler (indistinct) to see how long it takes. So I'm like, no, I'm not
gonna let this happen, I'm not gonna take 10 seconds, to let my user wait for 10 seconds just for this conversion to be done, okay? Here's another way that I propose, because the prevalence
to incidence conversions is actually one-to-one and
monotonically increasing, I use this to my advantage. So instead of running 10,000 value, I say convert prevalence value of 1001. And this 1001 value is
very specifically zero, 0.001, 0,002, 0.003, all the
way down to 0.999 and one. Okay, so this is a sequence of values. So once I calculated this value, and it's called pre-calc, I
can use quantile functions, and I can put in my R, the vectors, and I can pull values based
on these pre-cal vector. This is all thanks to the
fact that the conversion is actually monotonically increasing. Okay, and because of that, I
only need to put in 1001 value into this convert prevalence function. And you can see the
result in the profiler. So the profiler is great
because profiler let us know line by line what the
time taken to execute this particular line, okay? The graphic down here usually don't make a whole lot of sense, but the graphic up here
is very, very useful. So what the profiler tell me here is that for my first way of executions it takes 17,000 milliseconds, which means 17 seconds, okay, to fully convert the 10,000 value. But for my second way, it
takes 1700 milliseconds, which means 1.7 seconds to
run this first line of code, and then to run this second line of code using quantile it takes 10 milliseconds. So in total, I achieve the same task only in one 10th of the time if I were to just blindly
use the convert prevalence function that is provided to me by this group back in Oxford. Okay. And in fact, in the profiler
you can actually also look into data panel and you can
actually look specifically into convert prevalence function, which are the stat that is actually taking a lot of your time. For example, invert PF. Sometimes there are more information here if you know what's going on
with the invert PF sections of the function. And these particular sections
really helped me a lot in my journey in terms of coding R. Okay, so just to close this session out, I want to just show
you that my conversions offer both way, they were actually more or less exactly identical,
just so that you know, I'm not cheating you by
using my second way here. Okay, this is very much relevant to what I'm doing in my Shiny apps. The third demonstration
is also kind of relevant to a lot of my other Shiny apps that I don't get to present to you today. It's the idea that sometimes we rely on some sort of external source to read in large tables, big CSV table. Okay, and oftentimes when
these kinds of things happen, so for example, your
Shiny app at the backend, you're reading in hundreds
of thousands of rows of CSV, and then you do some data manipulations before you turn it into visualizations. Then very much you want to
use packages like Tidyverse and data.table to do these
kind of manipulations. And the R scripts that's supporting this demonstration is here, and I'm gonna go over to
the code in my RStudio. So in this demonstration
I'm gonna show you sort of the comparison of
the Tidyverse packages. So Tidyverse packages include
deployer that we know of, TidyR, reader is not a package,
that comes with Tidyverse. Reader, obviously, from
the name you can see that it's how you use it to read a certain data table, for example, CSV. Data.table package is a
sort of like a MVP package when it comes to big
data kind of analysis, and then Microbenchmark. Okay, so the first step of
my sort of demonstration is that I need to be able
to generate fake data frame. In this case, I specify
the number of columns and the number of rows
that I want to generate. And then it's gonna generate
a fake data frame for me that is full of numbers. I'm gonna create a result table, in this case I'm gonna run it first because it's gonna take some time before I explain what's
going on this chunk of code. Okay, in this chunk of code, basically, I ask the three functions in base R that is the read.csv, in reader
package there's a read_csv, and in data.table I'm using
the function called Fread. So the three functions
in the three packages, I asked them to read, first of all, CSV that has 10 rows of data. And then increasingly I ask
them to read 100 rows of data, and then 1000 rows of data,
and then 10,000 rows of data, and then a hundred thousand rows of data. And in each of these step,
I recall the number of time, the amount of time that is
used to execute this function. So system.time allows me to do that, and I recall them into my
empty data frame there. Okay, great, I'm done that, just in time for me to plot the results. And we can also look at the table here. So you can see from the plot here, by 10 to the power four,
which means a thousand row, wait, 10,000 rows of CSV, you can see that the
performance of base R package start to be really not good. And then by a hundred
thousand rows you see that the time taken to read the CSV went up to about 50 seconds. Your mileage may vary because I'm using an eight years old computer right now. It does seem to take a
lot more, if you will, compared to using a much faster computer. For reader package, you can
see that up to 10,000 rows the performance is still not bad, but when it comes to a
hundred thousand rows, the performance tends to deteriorate. But when it comes to data.table
function, that is Fread, you can see that the performance went up to 10 to the power of five rows is still a pretty good, it
takes about a 0.34 seconds to load that amount of
CSV into our system. But really sometimes if
we didn't need to rely on some sort of external sources of data, CSV file, for example, we can
actually store our data frame, big tables, for example,
into this format called RDS. So RDS is sort of like binary files that is specific to R, that
is used to store R objects. Okay, and you can use
an R write RDS function to write the data frame
into this file called, for example, temp.RDS. Okay, if you sort of
calculate the number of times that it takes to read RDS, you can see that it takes
only about 0,11 seconds to read this RDS format. So whenever it's possible, you should also try to use
the advantage of RDS files. Okay, just a quick
demonstrations of filtering. When it comes to filtering, I am comparing four types of filtering. The first one is base R filtering. The second one is use the
deployer filter functions. The third one is I convert my data frame to a data table object first before I use the data table way of filtering. And the fourth one is I
use the data table object that comes with the data table package, and to do the filtering. It's gonna take a while as well, which means I get to have a breather before we talk about it. But you can also see here the
performance of the packages compared to the base R package. In this case, you don't get that much of a performance improvement compared to the read CSV, for example, but you're still looking at, for example, meantime of running
deployer filter functions, or the meantime of using a data
table pack way of filtering. It's still about kind of 1/3rd to 1/4th of what you would be using
comparing to a base R package. So if your Shiny apps
is using a lot of these kind of manipulations behind the scenes, you really, really want to
consider using Tidyverse or data.table to your advantage. Okay, so these are the three points in terms of efficient coding that I feel very strongly about, that I have been applying
to a lot of my Shiny apps that I've created. I hope these small demonstrations
seem useful for you when it comes to coding
efficiently in your Shiny app or just general R coding. The next part is very much
related to the second demo that I showed you just now. We precalculate things,
we store, in this case, so when you store these
particular pre_calc, for example, you can keep on using
it over and over again until my user is tired
of using my Shiny app, for example, right? So you do one time calculations and you can keep on using them. And that is the philosophy in a lot of our Shiny app development. It's just extremely helpful for
us to write out our workflow and then analyze which
part of our workflow can actually be stored away or can actually be pre-calculated if it's taking too much of our time. So just to look back at
the health facilities app that I showed you just now. Here, for example, for
the first functionalities, here are workflows that I
sort of in my mental model, sort of I created this
based on what I think I want to add onto my app. So first of all, I load the data and then I fit the GAM model. Both of these are really, really fast. And then I ask user for new coordinates, and then I calculate new travel time based on new coordinates. This is the part where I'm not too sure because it could be that it's gonna take quite a little bit of my time. But you will see that it
turns out that, you know, it turns out that the calculations
was actually pretty fast. So I'm kind of lucky in the way that these calculations is not very long. And then finally I produce
new predictions using the GAM. So it turns out that GAM, GLM or some of the linear models, when you use the predict
functions that they provide, this actually is just a bunch of a matrix computations behind it, and so these functions are
generally very, very fast, even when you have tens of
thousands of predictions that you wanna do. So in my Shiny app, I didn't
think that loading data and fitting GAM model makes
sense in my Shiny app. So what I did was I
stored the GAM model away, and then when I need it, I
will pop the pre-fitted GAM into my Shiny app and
use it for predictions in the later part of the workflow. And this is sort of like the workflow that I decide for my Shiny app. Just to show you how you
can store your models just in case you're not sure how to do it. For example, I use NGCV
to fit my GAM model, and this is just creating
a bunch of fit data, and then I fit the model using GAM. And in this GAM I have two
dimensional splines here. And my GAM model is fitted, and then what I did was
I use a safe function, and I wanna save this object called mod, and save it into a file
called gam_mod.RDA. I'm not sure what's the significance of dot RDA, the extension itself, I'm pretty sure you can
use other random extension. But when I learned these functions, someone else just uses dot RDA, which seems to be standing dot R data. And that's what I use. I save it away, so now in my environment there is a gam_mod.RDA file. And then when I need it, I
can use the load function and I load this particular file. And then when I run mod, you can see that the
mod is now in my system. So this is a great way of
storing your model away, and then pop it back up when you need it. Okay, we also have the
optimization function, I have this in the health facility app. Again, this is sort of the workflow of my optimizations functionalities, and there's particular steps
that takes forever, right? Special optimized. We use sort of like genetic algorithms to do our special optimization to find out which is the best spot to place these new health facilities. This takes an extremely long time. And so what I did was I
realized that I can pre-optimize and store these coordinates
into a CSV file. So when I need it I would just pull these
pre-optimized coordinates and show it to my users. So this obviously turn
30 minutes fitting time to zero seconds, literally. And if you want to have look, this is kind of like how
I organize that CSV file. So this is the number
of health facilities. When my users say they want
three health facilities, these are the three longitude and latitude that I'm gonna display to my user. And then with the metrics as well. You can choose different metrics and they will have different
corresponding longitude and latitude to display. So in some way I'm turning my Shiny app from a big calculator to more or less, not so big calculator and
more like a visualizer here. And I think this is extremely
useful for many of us who often have to use
things like optimizations, or sometimes certain things that we know that it's sort of gonna
be staying stagnant, this is a very helpful way of designing our Shiny app around that. I'm not sure, are we very,
very on time here or? Would you spare a few more minutes? - I think, go ahead. If people need to leave,
feel free to leave, but we're recording, and
we can make it available for people that need to
leave, but keep going. - Cool, yeah. I'm just gonna address a
little bit about the issues. Sometimes, though, I say that
like predictions from GAM is very fast, so we can
just embed the model into our Shiny apps,
but sometimes, though, our model can be very complicated and that predictions out
of the model directly isn't that simple and fast. And this is a work that
I've done with another group in University of Florida. They came up with a
very, very sophisticated and complicated model to model the distributions
of aedes aegypti and aedes albopictus, these
are two types of mosquitoes that are known to transmit
aedes, Chikungunya, and I think Zika virus as well. And they use a bunch of
environmental covariates, and they also come up with a zero inflated
negative binomial model with site and county level
kind of random effects. So a very complicated model, and more so that they actually rely on a package called a glmmTMB to fit this model. And the problem is that
fitting the model takes hours. And then the worst part
is if we were to use the predict function that
comes out from glmmTMB package, there were a lot of
calculations behind this to make extremely accurate model, accurate kind of predictions, when it comes to uncertainty as well, and this takes extremely long time. So in that case, this
is really tricky for us, but in this case, sometimes
you want to realize that model is essentially a
bunch of estimated parameters, and sometimes what we really want to do is to be able to dig deep into the model, and to pull the parameters out from the glmm object, for example, and calculate the predictions ourself. This requires a little
bit of understanding of the statistics behind, but
I think it's extremely useful in this case because we are
turning one hour prediction time to, again, essentially zero seconds or one second kind of calculations. This app is still under development. So I'm just gonna be
able to show you briefly what has been done to let you have a feel about what happens when you pull these estimated parameters out and to actually use it for calculations, and calculate your predictions yourself, instead of relying on the package
in-built kind of function. This turns out to take a long time. Let's hope that it doesn't. Well, since someone somewhere
is not very happy with me. Okay, Shiny apps is not happy with me, I guess I will not be able to show you, but the idea is the same that we have some sort of a control panel that allows people to change the county, change the environmental covariates, and they get to see the
predictions almost immediately. And that would be some sort, this I think is the kind of workflow that I very much recommend
for people who are, for example, using a
very complicated model, for example, sometimes we fit it some sort of like multilevel hierarchical models, and it often involve us storing all these like posterior samples of the estimator parameters, and when it comes to the Shiny apps, we use these pre-stored, pre-fitted kind of posterior samples. and just make multiplications, matrix computations behind the scene. When it comes to matrix computations, these calculations are very fast, and generally don't take
us a whole lot of time. So it was very much
what I have to show you about today's topic. And I would close out my presentation with some final random tips
that are very close to me. I have been doing a lot of geospatial kind of visualizations, and I find that SF is generally faster than SP when it comes to shape files and whatnot. But sometimes we still rely on SP because it works better
together with the rasters. High quality shape files
can sometimes take much, much longer time to
render plots and leaflets. I didn't get to show you the app that I wanted to show you just now, but that would be an example where Florida shape files
turns out to be too detailed. And I had to downgrade
my shape files using, for example gSimplify functions
from the Rgeos package. And finally, finally, this
gave me a heart attack a few hours before this presentation, something wrong with my
code in my Shiny apps that I show you the health
facilities Shiny apps. It turns out that it's
because filter function, for example, select functions, there are multiple package
that has the same functions, and sometimes R got no
idea which functions are you referring to. So do remember to sometimes
shot-proof your code at prefix of deployer, for example, in front of your filter and
in front of select functions. That's very much for the presentation. Thank you very much. And, yeah. Here are some of my contact details. Feel free to give me questions,
comments, suggestions, and if there's any questions,
I would like to hear it. - All right, thanks Ben
for that presentation. So there's at least a
few questions so far, the first one being, well,
it's more of a comment that there's also the role apply function that can also be used to
calculate moving averages. - Yeah. - The first question is, is there a way to use the profiling tool to measure speed of functions in Shiny? So for example, like
quantifying reactive changes to a map, or a graph or, something like. - I'm not aware of using
a profiler directly with Shiny apps. So I think it was last
week's presentation. There are ways for people
to test their Shiny apps based on a certain preset input. And I think that would
be the best way for you to profile what happened
if they put this input, and one at a time take
to profile those codes. Yeah, I'm not aware of any great way to directly using a
profiler with Shiny apps. - Okay, and another question asks, what's the difference
between a dot R data file and the dot RDS or RDA file? - Yeah, that is a very good question. I definitely asked that myself before, but I didn't quite find out my answer. My gut feeling is that RDS and RDA, they both work the same way. So you most likely can write the model into RDS without problems. Yeah, so that is as much
as I can say, but, yeah. That's as much as I can say about it. - Okay, and another
question asks if you know if there's a speed difference
between plotting methods for a GG plot or base R, other
different plotting methods. - Not so much. I feel like I haven't seen
a particular difference between GG plot and base R plot. But I do want to say that there are certain things that you do that can take longer to plot in one plot compared to another plot. For example, there are
certain small things, like the choice of colors or the choice of the size of the dots. Those actually make a difference. - Okay, well, I think those
are all the questions. There's one more. One person posted a link that there's a profiler for (indistinct) that works with Shiny, although it hasn't really
been tested that much, or they haven't tested it much. - Oh, great, yeah, that's great to know. Okay. Yeah, okay. (keyboard clacking drowns out speaker) Shiny. Oh, yeah, profiling in Shiny. Oh, that's great, yeah. Yeah, that could be one way to, yeah, well. Yeah, right, so you actually
have like the run examples that you can (indistinct)
through it, yeah, that's right. So I think it was last
week's presentations or two weeks ago that, yeah, you can use one example to sort of test on your Shiny app, and then you can add a
sort of like a wrapper out of the (indistinct), yeah. Thank you so much for the comments.