- Everyone, I'm Caitlin. I'm going to be doing a little bit of like a coding tutorial
today based in R and shiny for those that haven't
worked with R or shiny, we have linked on the page some other videos kind of
introing R and shiny separately. So you do need a little bit of background for those those two languages and packages to kind of get a-- feel
comfortable with this video, also linked a bunch of really
fabulous free R resources. If you want to really dive
into R not just for data viz but for data science and
programming in general. I really wanted to focus
on kind of like a hands-on sort of example of how you can work with actual COVID-19 data,
that's out there and it's public. There's a lot of it out there. Not all of them are created
equal unfortunately, some of them are so messy. It's hard to even explain how messy it is. We're going to start with a
kind of more friendly example from the New York City Health Department, so since I'm in New York City, Rockefeller's in New York city, this is a more personal kind
of like look at COVID-19. We're gonna start on GitHub actually with the source of the
data I'll be using today. For those that haven't
worked with GitHub before it's a really phenomenal kind of place for the broader data community. So the format of this is essentially what are called repositories
or repos are shared and stored on here. So this is sort of the username
for the Health Department and so actually don't know they probably have other data here. We can take a take a quick peek. Oh, it turns out it's just COVID data. Yeah. So a fun thing about COVID for a lot of Health Departments is the kind of urgent need
to share data with the public was really fast and a
lot of Health Departments had no capacity to do this in any way, let alone just like handle
and collect the data. New York City in general
for various departments has pretty good like open data resources, totally recommend checking
out those data sets because there's a lot of really fun stuff you can learn about the city that way. Like COVID kind of hit everyone very fast and the need for information and the need to talk
about that information and then subsequently visualize
and analyze that information was really fast and New
York City had the capacity to ramp up, but a lot of places didn't, and it took them a while to get to a place where they could share data, so as a fun scavenger
hunt on your own time you can probably try to look at data from other state's counties to get a sense of the
variability of how this looks, this is the repository we're
going to work with today. There's a lot in here. We're not mapping every
single thing on here. The game plan will be to
create a map of the city and broken down by a
sort of form of zip code which we'll talk about in a bit, on top of that map obviously
we care about COVID data and the spread of COVID-19. So we'll be looking at
a few different metrics the kind of most basic
one being case rate. So cases appearing per a
certain number of people in the population. Take a minute to look at the data itself and what it is, I talked
a lot in the last video about how you really have to
be familiar with your data before you even begin to
think about visualizing it. The NYC Health Department, has their own set of
visualizations actually on their data webpage. We're going to make
something similar to that, but we'll be focusing mostly
on just a handful of metrics. Whereas I think they're
showing a bit more. Already you can see, and this is mirrored
on the GitHub as well, how they're defining cases. What is a confirmed case? People with a positive molecular test, probable cases people with
a positive antigen test. So the right, there are different ways in which we're assessing someone
having or not having COVID. You guys are probably familiar with many different types
of tests that are available, ones that are detecting
the presence of an antigen ones that are based on PCR. There are many different ways in which you can actually detect COVID-19. And because of that, there are
many ways to actually define someone having a case or not. So we're going to make a
kind of version of this. So this is a static map what's
known as a Chloropeth map. So that just means that
regions are sort of colored by a data point. In this case, it's a seven
day percent positive. So what does that mean? Percent of people that had
a test who tested positive that's actually quite
strict definition, right? This is not like the percent
of the full population that tested positive. This is the percent of the people that actually had tests
that tested positive. And they have this really important, note here that all data are incomplete because we're not testing
every single person. And the other feature
here, this is a seven day. So what does that mean? Seven day means that, these numbers are being
aggregated on a weekly basis. Individual days can really
mess up data collection. And so to kind of correct for that noise or kind of attribute
too much interpretation or weight to one given day, you average across seven days to calculate a percent positivity rate, you can imagine if this was
a daily percent positive let's say, I don't know, somewhere down here in
the lower East side, one of this zip codes maybe three days of data were input or collected for some reason on one day instead of, you know split
across those three days. If someone were looking on
one day from the day before, and like coming back to this map and seeing all of a sudden this becomes a much, much darker color. They can interpret thinking like, oh my God something happened
on this day to cause you know, some huge burst in percent positivity which wouldn't be the truth, right? That's like a slower division of time. So kind of drop in all those new cases. So this is why we have the weekly averages or aggregations I should
say of percent positive. It's more stable and reflects
kind of change happening in real time. Some people actually
even extend this further to further reduce the
kind of potential noise. So like 14 day, for example we just had the holidays in December. And so that was actually
like a huge problem with reporting in a lot of states counties the weekly averages were
actually susceptible to that, because there was this kind
of longish period of time for some people need more
than a week of vacation and people weren't getting tested. There were backlogs from
on the reporting end. There were these kind of
artificial drops in numbers, which you know people
had to sort of clarify for their audiences through our
publishing data around this. Hopefully they did, some
of them I'm sure did not. This is kind of a map sort
of what we're going to do but we're actually going to do this, over the course of several months. So starting in August
this seven day numbers this calculation, didn't start happening
actually in New York City until early August. Jumping back over to the GitHub repo. If you look at any GitHub repository you often will encounter
a readme mark down page that's what M D stands for. This is pretty awesome in
terms of how well explained the data is, it even goes
into detail about like how to download the data, like how to make requests
if you have questions, you can submit an issue on GitHub also updates to the
way in which maybe data is being described or the new types of metrics
that are being included at different points. 'Cause these things happen. For example, they change from PCR test
to molecular test, right? This is a technical distinction, but they get into this stuff. There are so many different kind of facets to talking about COVID and
they really do a nice job here of explaining the data. On your own time you
can definitely explore the different types of data here but we're going to focus on trends since we're going to look over
the course of several months. And so in this folder,
we have lots of CSVs. So that's, comma-separated.
The department recommends against interpreting daily changes as one day's worth of data, due to the difference between date
of event and date of report. So that's really key too. It's something I didn't actually
mention of people associating an event being recorded on a given day as it actually happening on that day, but think about in real life,
if you've tested for COVID or have had COVID and that testing event
could also be lagged between when it's actually
reported to the system. There's not this kind of
like instantaneous thing where you get a positive
test in some systems, somewhere on a server it gets updated to indicate your positive COVID case. So that's an inherent flaw with what these types
of reporting weekly data so we're going to be focusing on. And so there's a lot of data in here, antibody testing, case rate, which is one of the ones
we're going to look at, again, the cases by day which has all of these different metrics. And these kind of prefixes here are actually sorted by borough. So people wanted to
compare for New York City, you know like the Bronx
versus Manhattan versus Queens versus Staten Island. These were important especially kind of getting
broader area aggregations of hotspots and things like that. Deaths by day. Hospitalizations by day. Percent positive again, is
when we're going to use. And then we're also
going to take test rate. If you notice the three data
points that I'm mentioning are ones that are followed by MODZCTA. So ZCTA is a specific
type of geography, right? So we can talk about across the city. We can talk about borough
or should I mentioned before so it's like kind of sub
regions within the city but then we can talk about zip codes. MODZCTA actually is not exactly zip codes which I'm going to talk about in a bit but it's a way to map
what are regions defined by different zip codes. So it's much smaller region to get a sense of COVID spread
in those smaller regions. So much like this map, right? These are this MODZCTA
regions I was talking about. This is much better than
just kind of comparing the full borough. Why? All for those that know their
boroughs in New York City, you can look within these big boroughs and you can see that there are actually lot of variability across
that even adjacent zip codes. So that is actually really
important to inform a public that is concerned about travel or concerned about quarantines
and transmission of the virus and all of this stuff, right? So people wanting to be informed about severity of the disease. And this is also helpful for
people on the government side who are trying to figure out, you know where to prioritize resources, if COVID is more severely spreading in one region versus another. This also affects hospitals too, right? Hospitals being either
distributed evenly across the city or not which is definitely not the case but you know, if a hospital is here it's gonna need more resources probably than a hospital here. So that is why we're looking at these sort of smaller regions. Again, I'm sort of, you know,
filtering and making decisions about the visit we're going to make today since this is more of a tutorial but there's a lot of data here. And so you don't have to
do exactly what I'm doing. And this is meant to just sort of teach how to make a specific
type of visualization. It's going to be interactive as well too. So this one, when you hover over you get some information
about each zip code whatever you do, if you're
working with this data if you're working with a different dataset make sure you're
understanding what the data is. This is super detailed. We're doing the seven day
aggregations as well too. So people with the molecular tests are aggragated by four
weeks from each Sunday to the following Saturday. And that's how they're classed and log. So again this is seven day aggregations. We want to download this data, to do so, there are actually a couple
different ways you can do this. If you have experienced
with the command line you can actually just
kind of interface directly with GitHub, using Git
commands G-I-T, Git commands to pull this repository. That way you can download it manually. So literally clicking here code you can download the zip file. Additionally, clone the repo as well too, if you're working within GitHub yourself and have your own GitHub repo, right? You can, what's called fork the data over creating
essentially like a copy of it and then branching it off into
your own space to work with, the way we're going to do it today. I'll show two different
ways we're using R, you can actually copy
the link address here where it says download zip. So it's right clicking your command, clicking on what you're
doing, copy link address. The other thing you can do with RStudio is you can use this address. Which is ending in a dot
git when you load RStudio. So now that we're in R, if you don't know some of the terminology I'm using about R or shiny definitely check out
those other intro videos. I've done introducing the language and like how to work
with some basic stuff. I'm starting off with a
pretty much clean workspace. One thing you can do if
you like to use R projects is you can start a session
in a GitHub repo too, so for example, if you go here I'm not actually going to do
this but when you do a project, you can make a, its own
directory and all this stuff set up a new working directory. But if you want it to
start with a git repo you go to version control,
which is the main benefit for using GitHub is you can
track versions over time. If you click git repo URL. So that was that git
addressed I was showing you on the NYC Health GitHub page. You drop in here, give it a name and you can put it wherever
you want on your computer. So that's an option too. The way I'm doing it here is a little bit more sort
of direct within the script. So I'm calling this script
NYC COVID prep data.R, and you'll see why in a second. This is how I'm downloading the file. So, first of course, I'm
setting my working directory. So wherever you want to
work on this project, make a clean directory for
it to clean folder for it and set your RStudio initialized to go to that working directory. So once you do that, you download the file and that URL for the actual
zip, download those here and then you set a desk file. And so this is going to
be the name of that file. And this is the name of the
repo entirely data master. That's zip. And that's going to be
the folder that pops up I could run this super quickly
and it downloaded that zip. And then we can actually just
unzip it now that it's there. So I already moved to
where my directory is. I've already done it before, so you are going to see
this already but if you, for example, the data
is updated all the time. So if you wanted to do it over again, you'll read download this
zip, which is what I just did and then unzip it in the same location and it all unzip there. And so it did that. And so let's just take a look
to make sure it's all there. Yes, it is, we have R folders. So, right? The trends folder is going
to be where we're gonna grab that percent positive by
MODZCTA test rate as well, again by MODZCTA and
then case rate as well two or the three we're going
to grab from this folder. So on top of any R markdown or
R scripts whatever it is, we want to initialize some libraries. Tidyverse is how I work
with data, data wrangling and general recommended for everything and includes ggplot, it
includes deep outlier, all that good stuff. Vroom is a much newer package. It allows to really fast
reading and importing of data I should say, also specifically
CSVs for the most part it can recognize delimiters and it can import stuff
pretty quickly on that front. So that's awesome. SF and tigris are packages to working with spatial geographic data. So that's important. I'm kind of listing here why
we're importing all this stuff. So quickly start up all of these guys. So now these guys are
all set up and so vroom like I said before, this
is a way to read in data. I like vroom for CSV files that
are pretty straight forward. So usually I look at the delimiter, right? And so what I'm doing here, right? Is I'm reading in these data frames. And so as you can see here,
I vroomed in this CSVs. It recognizes it's a comma
separated file, right? So a comma and it's linking in all
of these different values, the columns. And so these are three
separate data frames that I've just generated right now. Let's take a look at a case rate first just to see what this data looks like always the first thing you want to do. And there are a lot of columns here. So this is the raw data from the city. So we have one column. So remember, I mentioned that this type of
second seven day aggregation didn't really start until
August of this year, right? A lot has changed since August. So it's fun to look at, fun or depressing fun to look at the trends in time just even over these past few months. So right, We have week ending. And so that means the cases reported in these subsequent columns
are reported from that Sunday before through that end
date that's listed here. So that's why it's week ending. And so we have data all the
way up to fresh start in 2021. So that's our kind of
like last week to compare and not going across, so these are pretty
nicely labeled columns. We have case rate over the city. And so why is it a rate and
why is it not just a raw number of cases that data does exist in the repo. If you want to take a look
at the raw numbers of cases, but this rate as the
GitHub readme mentioned is calculated as the number
of cases per 100,000 people. So if you remember from my last video why is this happening? We want to get a rate that
is including the information about population or some sort
of per capita measurement because we really need
to normalize for numbers. There are many different numbers of people in different parts of the city. The population is
concentrated differently. And so you can't compare
any region to another unless you do some sort of
normalization for the population. So this is why we're
looking at case rates here as opposed to just number of cases. So we have case rates across the city New York City does not have
the same kind of COVID spread because it's super regional. And so we have borough breakdowns and then we get into the zip codes. So 10001 all the way 10280
and this is cool, right? This is a lot of data in here. And the data looks clean. It looks, there's not a
lot of like weird errors with the formatting of the data. A good thing to always check
when you get a new data set too is check the class of the
different, you know, variables. Yeah. We have doubles here, which is good. These are just numeric data. The date, however, is actually
a character which is fine. There are a couple of things we gonna do, We'll see in the code to change that. Yeah double checking what
your data structure is, before you even do anything with analysis. The other point I wanted to make is, remember this this data
file was by MODZCTA. So let's talk about what MODZCTA is, because it's a little confusing again, if we need you have
a question about the data cutting into the actual
readme for the GitHub repose is a really good place to start. The data here is accompanied
by this lovely explanation. So MODZCTA stands for Modified
Zip Code Tabulation Area and ZCTA is just Zip Code Tabulation Area because the zip code,
as it's explained here isn't actually a domain. It's not like a county or a state. It's actually a collection of points that make up a mail delivery route which is not something
we typically think about. But when we're talking
about regions and space, and so when when we're
dealing with mapping in R, really any software mapping for viz, we need to actually understand what the regions are bound by. There are these issues as well too with, you know, some areas
like having their own, buildings having their own zip code, or weird stuff like that. Essentially what the
Health Department did, is create modifications to these ZCTAs that take these like tiny little zip codes that are affiliated with like
maybe a building or something and combine them into like a
slightly bigger region, right? To make these estimates
for population size, bore legitimate, because
dividing by a hundred thousand, you know, is a good
normalization across the board, but doing it for a place
that has a tiny population it might not actually capture the regional spread of a disease. So you want to keep your
geographies like fairly comparable in terms of size, a little bit of aggressive
detail for mapping, but this is the kind of thing you run into especially when you're dealing
with geographical data. So to deal with this they offer
this ZCTA to MODZCTA file. Just take a quick look at that. And you'll notice for the most part it it's like a one-to-one mapping. It's like, oh, why did they
even bother doing this? It's not all zip codes, right? This is pretty much uniform for everything until you get down here, look at this. So the MODZCTA is on this side. This one MODZCTA is encapsulating several different zip codes because those are like
tiny, tiny little zip codes not comparable to maybe a
more standardized zip code the way we're going to handle this, is we're going to we're
going to use the MODZCTAs. But if you wanted to as well to make a more enhanced visualization you can create like a mapping system within to convert over to ZCTA. So a little added complexity,
the way we're going to do this in our final app is to include this table for lookups as well, too. So if you live in a zip code that maybe is one of these smaller ones, you can go and check and make sure it's mapping over yourself. So these are the files we're
actually going to read in. So where's this in
geography to sources exactly the same as what's on the GitHub repo just now it's in the comfort
of my little RStudio space. There are a couple of different ways in which a shape geography
data can be stored. The one we're gonna work with today is the shape files that
are SHP files from 2010. So this is the last time these
kind of domains were defined. And this is getting into, you know, like some
infrastructure stuff in the city and how these lines are drawn, especially for things like--
I don't know, precincts being redistricted and county
lines and all this stuff but this is congressional districts, like depending on what you end up doing with this in the future, you can get really, really
far into a shape file data and all this stuff. But for now, just very simply we use this st_read function
to read in this shape file. So this is what's necessary
to read in geographic data. And this function is
part of the SF package that I loaded initially early in that in, simple feature collection with
170 features in two fields. So this is how geometry data store, this is literally like the lines upon which we're drawing
these boundaries on a map you can imagine, and this is how R is
storing that data right now. Now that we have our data, we wanna work with it a bit to get it into a space
where we can visualize it, checking this out. We have right, all the
data we could ever want but it's not exactly in the
right format to work with. Why is that? So currently we want to
map by zip code or ZCTA, I should say. But right now that zip
code data is kind of spread across this data of frame in a classic data cleaning situation or should I say data wrangling. We're going to be shifting
the format of this data frame into a long format. And what does that mean instead of having, you know, all of these
variables spread out in this wide format, where for every week we
have a different zip codes going across this way. We want to have a situation where we have a column that is zip code and the case rates are
aggregated kind of lengthwise as opposed to across
the zip codes this way. But this will be, you know, a few columns
as opposed to 22 columns. The first thing we're going
to do is actually remove, we're not interested in the borough data or the city-wide data. So we're going to remove
these first few columns which does that. I'm calling it case rates now just to give it a new variable name. So let's take a look at what
that is after I do that. So now we've got rid of that
stuff that was here, right? The city-wide, the borough-wide and now we just have zip code data. Awesome, this is the reshaping. So tidyverse introduced
actually very recently this new function, a new set of functions pivot wider and pivot longer. It used to be gather and spread. I think this is a little
bit more intuitive that the new language, but essentially what I'm doing right is I'm moving into the long form format. So pivoting it longer. I'm taking all of these
columns 178 columns. Actually, I think I said
22 before that was wrong. 170 columns, 22 rows and
we're going to put them all actually now into one
column called MODZCTA. The prefects for that is going to be, that case rate blank, right? So that's the prefix that's encapsulating all
the zip code data I want and that's going to be
kind of removed, right? We don't want a list that
says case rate underscore we're going to want to get rid of that. So we just want the
actual zip code itself. What will those values be called? It could be a new column
just called case rate. Currently, it's individual case rates switching to this long
format, case rates long. And just like that, we have
our weeks, we have our MODZCTA and we have our case rates, and this is a lot easier to work with now that we will be
trying to visualize it. So now that we've done that, we're going to do the exact same thing to the percent positive data frame and the test rate data frames. So just to prove to you that these are almost identical in format. Definitely, please
check before you do that but we're just gonna
look at the head of this just to show you it's identical, right? So instead of case we
have PCTPOS percent pause, city all this stuff here in
the front of the boroughs, and then it gets in here. So it's literally the same format. And so that means that it allows us to use almost the same code. Definitely, check before of course and same thing for test rate. So I should say again,
the reason why it's rate is that these numbers are calculated per a hundred thousand people, percent pause again is the
number of molecular tests that ended up being a positive out of the people that got those tests. Okay. So I'm just gonna
run this next batch of code and then we will have our
reshaped long format data frames. And now that we have
those three data frames we're going to merge the
three of those together. So to do so again, tidyverse gives us some
pretty cool tools left join so left join and everything. If you're imagining A on
the left and B on the right and we want to join in
those two data frames we're matching based on values, all the values that exist
in that left data frame, those are preserved. And then the ones on the
right are matched over. But we already know in
this case that actually all three data frames are the same in terms of the week ending, for example, and the MODZCTA those are all the same. Definitely double-check before you do this because that's going to
affect the type of join you do in this case, we're matching
to the kind of leftmost one but and then of course
you can pipe this in. So we're first matching case
rates long 2% positive long and then upon that, we're
joining in test rates long and this is a really easy place
to mess up your data frame, depending on what values
you're matching to. So once I run that, now that
I've merged them all in, let's take a look at what this looks like and this is exactly what I wanted. So now we have all three in
one lovely little data frame all merge on week_ending MODZCTA case rate percent pause test rate, but it's missing something, it's missing that, I should
say MODZCTA shape file. So that geography data if you remember, looked a
little something like this, yeah, feature collection,
geometries polygon all of that. You'll see this multi polygon
thing for a given MODZCTA. The two data frames
have something in common and that's MODZCTA. And so to do so while
preserving this geometry data we can't just use a
standard tidyverse join. You have to actually use
something called a geo join. Which is a cool function
from the tigris package. And so running this, now we've
joined the two data frames. Now we have a geometry file, but it includes our week
ending our case rate or percent pause and our test rate. And this is the data frame
that we really wanted. We're going to make one more
change for before saving it. But this is looking really great. Again, here, I added a note in the code this code we're going to
put up on the website. We're using MODZCTA here
not technically zip codes, even though oftentimes the
two are interchangeable the way we're going to handle it here is leaving the lookup table
for people to double-check, remember how I mentioned, of course, that the class of
all MODZCTA is a character. We're going to make that a date just because it allows us to
kind of track things over time if needed. Typically, if dates are there,
you want them to be a date unless you have some specific
reason to class it otherwise but this is easy to do
with the as date function. And then finally, I'm going
to save this as a RDS format. This is the full final data frame. And so now this is R final data frame, but stored in RDS format. We want to, before we build our app another kind of like check to see kind of like what's
happening within this data. Like what is going to be
the story of the data? So to do so just focusing
on case rate first it's a good idea to check the distribution of anything you have. That's kind of like numeric. So for case rates, for example all I'm doing here is I'm
just plotting case rate as a numeric of course,
and just a simple histogram to look at the range range of data. So you can see that the
kind of range of case rates extending out to here it's
actually imperceptibly small, but the fact that this you
know, Y or X axis was moving out to 1500 you kind of know the
extent of the case range. Another way to check this
is to do min and max. So I'm going to actually
look at max first. So all my MODZCTA case
rate or 14 hundreds. So that's why this is
extended all the way out here. So just looking at this by hand you'll see the vast majority
of the data is in this space but it is extending out
to a lot of cases actually so that's not great now that
we've kind of like gotten a sense of the range here. And again, definitely do
this for the other metrics as well too. So for example, like max all and checking out some of
these other ranges as well too and you can repeat the same thing to get a sense of spread
across everything. And this is important, so the reason why, you know, getting a sense of
like the distribution of data is a great idea just so you
know how to visualize it, right? When we're doing sort of coloring and making design choices,
we want to actually represent the spread of data
to cover the range of things. And so when we're actually thinking about, making like a simple
version first of what will, eventually the app will be built around. We want to use leaflet. So leaflet is this really
phenomenal phenomenal package for R that uses, it's
actually called leaflet it a JavaScript based
interactive map package. And so checking out the
documentation for leaflet I totally recommend there's
so much you can do with it using the leaflet package for
R allows you to kind of access this suite of interactive
typically you would code it in something like Java script, but this R package lets you
kind of get around that, and use the suite of tools to do so. We want to actually set these labels, usually just HTML tool package that's what's the HTML tools package HTML and the sprint F function as well too. Let's you kind of set the
formatting for the pop-up that will be over each
one of those regions. We hover over what we're doing here is we're actually making so strong, like sort of bold formatting
for the zip code itself. Or I should say the
BODZCTA will appear on top and then there'll be a line break and then that'll be followed
by the case rate number itself. So things will be shaded and aggregated, but if you wanted to actually
see the actual case rate per, so that's per a hundred
thousand people, right? Case per hundred thousand
people, you could hover over and you would get the actual
case rate for that MODZCTA. So let's just run this to initialize that, and I'll show you where that comes out when we build them up. And then we're going to
set our color palette. So for leaflet, this is
actually like a function we set. So we have a couple different
options to do this colorBin colored numeric. I'm doing bin for this, this
is really an aesthetic choice. This is not something
that you need to like there's only one way to do it, but I'm using colorBin just because I want to make it so that, it's kind of like less
work to kind of guess at different ranges. And so I'm taking my range of data so that the full range of case rate data and bending it according
to like a set number, I give it so colorBin, we'll split a palette that I'm defining. And so this is something
that's inside RColorBrewer. If you don't know about RColorBrewer definitely like Google
that and check that out. There are a lot of kind
of pre available pallets that you can choose from orange, red. This is OrRd, is the one
I'm working with here, as an example, to show you the range. And so we're going to
split the all possible kind of color values across
this palette into nine bins. And that's the max for this palette. This is really just an example for now. I'm not saying this is like
the correct thing to do. You can really, especially for database, this is somewhere where you can spend a lot of time tinkering and
playing with different metrics and seeing what works best
for the data you have, something you can do, especially because the
data looks like this, it's bin at such that, you know you have a kind
of smaller range bins at the lower end of the spectrum and then maybe larger arrangements here. But for this case, I'm going
to make them all equal size just to keep it as simple as possible. So I'm going to initialize this. And so that's defining that pal
function and here's the code that I've written to actually
make this interactive map. So it's going into this variable
I'm calling map interactive which will be the actual object. That's the interactive map. And I'm taking my data frame all MODZCTA and piping that into several things. So first this is kind of like a weird data or I should say R cartography type thing. It's part of the SF package,
essentially transform the data we have into a
specific coordinate system. So that's going to like
orient the polygons the specific way, a lot to explain here, but the CRS is thing of
like a coordinate system and we're transforming
R current geography data into this sort of format. And then we're going to
run the leaflet function. So taking that data and running leaflet that's enough to sort
of like generate a map, but we have to sort of then set
the options within that map. And so to do so you can
add layers called tiles or in this case provider tiles. And this is really like the
type of map you're using. So I'm choosing the one
that's called CartoDB.Positron but this can be a lot of things. So just to show you an example this one is going to run this part. Oops, you get this warning message, but you can, you can
pretty much ignore that. And so I haven't put any data on here yet because I haven't actually
defined anything about it. So that's why you don't
see any data on here. And all we have is this leaflet
map using open street map and it's the entire world. So we're not even looking
at New City at this point but this is sort of what that
leaflet instance generates. And the CartoDB.Positron
is this type of map. You can send you to change
the way the map looks. So like different types of
projections you can play with. If you check as always, if you don't know what a function does, checking the help section. And so for a no map provider, there are lots of types of maps out there that you can play with. So the actual help guide
example is stamen.watercolor. And so let's just pop that in here and see what it does just as
give you a sense of the range stamen.watercolor I'm gonna run this. And I got this crazy surreal stream but this is cool, right? You can imagine whole types of maps. There are a lot of types you
can actually grab in here different map providers,
interface with leaflet and offer these up as
packages you can play with. So I think that's pretty cool, but if you're a member like
we think that this data viz in the purpose of it, we're not trying to, you know, showcase some awesome art, even though it maybe in another situation that might be what we're
trying to accomplish. This is a very cool map,
but at the same time we are using color in
this case to highlight the data we care about. So in this case, we're mapping
case rate in New York City. And so we actually want the cases to be the thing that's colored and that's what the eye
is going to be drawn to. As you load them, you can play with them and see you can zoom in on locations. You're getting all kinds of stuff here. Oh boy, it's actually quite beautiful. You can play with this and
look at the different maps and see how they look
at different resolutions and zoom settings. Let me reset the map to go back to, it looks like fairly boring where we're going to put
some pretty stuff on there. So again, just running that
and now we get to the data. And so this is a lot,
there's a lot happening here. We're gonna to walk through this slowly. So now we're in the add polygon section. There are lots of things you
can add to a leaflet bat. You can add circle markers. You can add kind of like pop up points. We're working with polygons
because the shape file. That's why we went through all that work of joining in that geography
data are considered polygon sort of like these irregular shapes. If you wanted to add sort
of standard like little dots or circles or some sort of like
uniform thing across there, there's a leaflet function for that. But in this case, we're adding polygons because we have very specific geometries. Those are those shape file data showing those MODZCTA regions. So we're using add polygons. And within this function, there are a lot of arguments
that you can play with. So the very first one
is we want to make sure we incorporate our labels
that we wrote up over here. So this is what's going to happen when we hover over any given polygon. So label equals labels. That's labels up here. Some of the design factors these are really not a hard and fast. These are just things you can play with. So the stroke of the actual polygon shape I'm just turning that off. Smoothing just a little bit soothing because some of the edges are look a little bit kind
of not super straight. And so I just upped it up to 2.5 a bit, just to add a little bit of soothing. So the lines aren't super
kind of jagged aesthetically it just looks a little bit nicer. It doesn't change anything
about the polygon itself opacity being one, fill opacity being 0.7, and so these are two
different features, right? So the opacity of the
kind of polygon outline and then fill opacity is like what's inside the polygon itself. And then the most important
part here actually is ends up being fill color because we're trying to make
essentially a Chloropeth map. What that is again, is recoloring regions
based on a data metric. So in this case, it's case rate
but it's not just case rate because we want to apply
that color function we defined up here. So for all case rate values we want to assign a
specific color on the scale that we set up. And that scale is going from
this orange to red colors, sort of like a light, it looks actually a
little more like yellowy the light yellow color to a darker red. And so this is the notation for that. This is the actually function
notation I should say. And so the fill color and so the polygon will be filled in according to the case rate there, the other option again,
this is another place where you can fiddle around with and see how the different
features affect the shape and look, but we're adding
in some highlight options because we actually want
to sort of single out the polygon you're hovering over. So thinking for the user,
using the interactive map if you're hovering over region, you want there to be a
little bit of feedback being like okay this is the
one you're hovering over. So instead of, you know,
just to set up polygons they're colored in a sort of static way but they don't actually change any sort of color when
you hover over them. So this is just gives the
user a bit more confidence they're hovering over the right thing and know what they're hovering over. So adding in a very light, it's kind of like a gray
scale highlight option. And this is that you bring
to front is important here so that you actually see
that when you hover over and then the fill is actually quite it's almost totally filled out. So, you know, you hover over it. It turns like a little bit gray whitish, and then it, you move on to another one. It returns to its original color already here there's a
huge range of possibilities with what you can play with. Again, I'm going to keep saying this but it's important when making data viz have a lot of kind of like tinkering and it's kind of like
photo editing sometimes when you are just playing around with different aesthetic
features but in this case, right? Like I'm making choices based on sort of like my
aesthetic preferences. Some of those have to
do with the data itself. I want to highlight certain
features about the data. Like I mentioned about
the color, for example but this is not the one
right way to do this. And also it's not the one right
data to actually play with. So there's a lot of stuff you can visualize from this dataset. If you have questions or you
don't know what's the right way to visualize something,
there are a lot of people you can chat with in
different forum online. And then also you can
always reach out to me if you have questions about stuff too, there is no one right data
viz is for something, right? There are some best practices
which we've talked about, but a lot of this is just getting a feel for how to make your own visualization. So I don't feel like you
need to stick with exactly what I'm doing and feel
free to experiment. 'Cause that's where it gets fun. And the last thing we're going to do in this leaflet function. So again, every single one
of these is a pipe function. So add provider tiles add polygons is very kind of deep outlier format. So we've added our polygons and then we're gonna add our legend and we're going to put this
position on the bottom, right? I'm giving it some opacity. So it looks, it's hovering
a bit over the map. And then again, we have
to match our color scheme. So that pal function we defined before. Same thing up here, values
again, being case rate and giving the legend
at title as well too. So let's run this whole thing. And so you'll see sort
of how the code lines up with what we see. I can run this map interactive
is the one we just made. One thing you'll probably remember is that our big data frame the all MODZCTA had a lot of weeks in it. We went to long, a lot of work to get all those weeks out there. This doesn't actually specify a week, if you notice in the code I'm not actually pointing
it to a specific week. So what this is doing is just
taking one snapshot of it and kind of stacking all the
polygons on top of each other. That's sort of why it
took a minute to load which is not what we
want in the final product but just for visualization
of kind of the aesthetics of the whole piece, this is what our kind of final product will roughly look like restaurant. Imagine as a rough draft,
this is an interactive map. So I can, you know sort of play around and drag zoom in, zoom out, zoom in. There's a lot of data on this actually, you can see the structure
of the different boroughs and this is colored in based on case rate. As you can see, right? Every time you hover over something, you get a little bit of
feedback here from the coloring and you also get that pop up information that we work to construct. So you have the zip code and then you also have the number of cases per a hundred thousand people. And this is nice because you have that backdrop
with the information about, you know, versus from open
street map is just the actual, you know, city itself is behind it but you then also have
your polygon stacked on top and the color we mapped
all possible cases. So our highest possible case number was in like 1400 something. And so that's in this top range. We set as a sort of auto
cut the full spectrum into nine evenly created bins. And so that's why we
have these 200 size bins but of course, right? Like you can play with this. You can set it such that each
bin is a very defined number and the bins are like, I don't know, zero to 50 then 50 to a hundred. If you do that, though, you have to, you know remind the viewer that the bins are not equally sized. You might want to do that in this case just because of how the bins or excuse me how the cases are case
numbers are distributed. But in this case, it's as simple to just cut it automatically
into this number. Especially with cases on the rise addressing the y-axis with cases rising is a kind of depressing thing to do, but you have to always
check to make sure your data is actually within the
scope of data color. Okay? So now that we
have our interactive app the kind of standalone leaflet. You can save it using
save widget function. So doing this. This is the object name. So this one's called map interactive and then you give it a actual file name. So was an actual HTML file. You can then load this up wherever. So this HTML file, you
can drag into any browser loaded up there, and
it's an interactive map. You can scroll around
it and view and share, which is really cool. So this is how you would
export a leaflet map. Now we're gonna flip over to
building this into a shiny app. So if you, again, if you
have no idea what shiny is, and what I'm talking about, definitely check out the other video that I go into and explain
what shiny is and how it works. And so very fundamentally a shiny app is a way in which you can
use R to create a web app. So if you want to make
a new shiny app in R you can do this pretty easily, new file, lots of options you can do here. I'm going to say shiny web app. And when you do that, you give it a name. So in my case, I called it NYC COVID and you have the option
of doing multiple formats. So I am doing a single file. That's just my preference. But if you want, you can do multiple file where you have a UI file and a server file because for every shiny app you have to code for
both halves of the app. There's the UI. This is a user interface, and there's a server side where you would include your functions and computations at the, you know, connect the
inputs and the outputs that the user is experiencing. I already did this, so I'm
not going to reinitialize it. This is my app to R script. So anytime you do this in R it gives you some really nice commented
out information here, to run it, you can literally
just hit the run app. I'm not going to do that just yet. 'Cause we're going to walk
through how to make this. RStudio has a bunch of
really great tutorials on how to work with shiny. I also included a couple
of resources specific to shiny in the links on the page. So definitely check those out too. The reason why I made
a new script for this and not over here is
just for kind of like, code project organization. You could in theory, do
this all in one script but in my opinion, that
gets a little bit messy. So just for clarity, I, for any shiny app I usually make a directory. And so this case is NYC COVID
where I have my app.R script. So this is the actual app
for the shiny web app. And then the other
thing you'll notice here is I put, actually this
is the wrong RDS file. I put the data frame that
we're going to work with here. And the reason I did that. So let me show you over here, all MODZCTA our data frame is all MODZCTA. And I want to save this as an RDS file that can then be initialized
for the app itself. And so this is how I'm connecting. The two R scripts in a way you
can do this in multiple ways. You can actually like initialize a script within another script and do like a lot of cool
connected script stuff, which is fun. But in this case, it's just one data frame that I want to reference that we already worked to build in this case I'm just
going to change the path to be actually inside this folder. So I'm going to redo this and then you'll see a popup over here cause she to get rid of this
one, just so you don't need it. And that's the RDS we're
going to use for the app. And so this is really good to keep your app scripts in one place, just so you keep everything together. It's just a good practice for
kind of project organization when you're working with lots of code. So now that we're here in our app, just you remember, right? We're trying to build something like this, this type of map. But make it such that the
user kind of determines what you're looking at. Do I have my libraries
I'm initializing as before I'm going to use shiny
for for the app itself leaflet for the interactive map, tidyverse for data wrangling, HTML widgets again, for the labels
on the interactive map, setting the working directory, I'm going to read in that
file from my app script. So that should already be in
the directory where you are, which is what I've set. Now we have the data frame
ready to go for the app. We have to define a UI
first for the application. And so again, this is what
the users experiencing when they are interfacing with the app. So in terms of inexperience
for an app, right? You could either be inputting information. So making a selection for example, or you can be experiencing something. So like viewing a graph, playing with the interactive graph. So in that case, that's sort of what
we're going to build here in a sort of simple fashion. And so for the UI, it's
always a fluid page function. The structure of this that I've decided to kind of build here is I
want to use a sidebar layout. So that just means there's going to be a panel on the left and then a pedal on the right, as well as just an easy
title panel at top too. When you're making a shiny app. The first thing you should
do always is to think about, for the UI at least they think
about how things are going, where things are going
to exist with the app. I'm going to run this quickly
just so you have a sense of what the interface
is going to look like. So let's hit run app,
it's a little play button. Okay. So this is the the
kind of final end product we're building this, when you run the app, it creates a sort of
like local like window within tied to RStudio. You can actually also click
this openness in a browser whatever you have for now, we'll
just supposed to stay here. So this is the layout I've chosen just because of the window itself, when it pops out was kind
of looking like the sidebar was actually on top but that's just because
of the window dimensions. So in this case, right? I've additionally used a sidebar layout. So I have the sidebar panel here where I'm putting specific features and then my output is actually
going to be over here. And that's where the leaflet
a map is going to live and putting my title here which I now realizing I need to fix, because remember this is not
actually ZCTA, it's MODZCTA. So I'm going to add that in a second. I have a URL here, and
then I also have some texts about the data itself. I'm also going to tinker with in a second but the main thing here, right? The issue we had with R
sort of single leaflet map was that it sort of took all that data all those polygon shape
files for the week ending and just stack them all for
all the weeks, all the time which isn't really helpful
when talking about a trend you just are looking at like one state. So in this case relating the user choose
the week to look at. So in initializes, such that it starts when the data collection
for this metric started. So that's the week ending in
August 8th of last year now and then looking at the case rate here I'm actually also building, since we have the data for
that in the frame for test rate and percent positive. Again, this is the beginning
of the data collection. So this is in August. That's why the colors kind
of look a little bit lower. I'm doing three different
color schemes here. Part of the reason I'm doing that, is because I want you know each metric to have its own kind of unique identity. They're not the same
thing highlighting, right? These are different metrics. If I use the same color scheme, someone could come in and think these things were all tied to each other. Of course they're connected
but we don't want people to misinterpret what
they mean for case rate. I actually ended up changing this to this yellow green palette, just because I wanted to
provide some distinction from percent positive, which I think is arguably
the most concerning metric and a sense of urgency. So I added that red color
to be associated with that. You have to always think about, how colors are perceived by people. It's not just an accessibility thing of like someone being color blind, which is very important to consider but it's also the associations people have with color, right? So green here, people tend to
associate it's a calmer color. It's a more like active
color, which is why I used it for the cases in this case, you can really put whatever
color you want here. I wanted something that was just distinct for kind of like the sort of
scale with lighter being less and darker being more, I think doing the reverse will
be confusing in my opinion. But again, this is,
there's no science to this. It's very much an art in terms
of designing the graphics and that I chose a blue
palette for a test rate. And so now that you have
a sense of sort of like, let's go back into the nitty gritty, always remember to like close
the app or stop running it. Otherwise RStudio isn't really functional. So make sure you do that. So if you remember, I said I was going to fix the title panel. So now I'm going to make
it this modified ZCTA. So I'm not hiding at all. When I'm showing this, these
are, you know, they do line up with zip codes, they encapsulate zip codes and represent them, but they're not always
individual actual zip codes since we talked to them
about those nuances. And so within the sidebar panel that like clump of text I had, the first thing I had actually was a URL. And the way you do this, this is if you have an
experience with HTML, this is taking directly from
that to work with within R, and so what shiny lets you do is you add in this tags dollar sign A, which lets you create a URL. And so first you write in the URL link and then I want the text showing to just say data repository. This, you know, the data
that we pulled from. So anyone should be able to replicate this and also putting target equals blank which is a way to initialize the link. So if you click it in a new browser so instead of just
refreshing losing your app it'll just create a new browser just a nice thing to add
for the user's experience. And then each file, this is another kind
of way to add in texts for HTML and CSS. And here I'm just literally
writing some notes about the data. So someone would just stumbled upon my app looking at it for the first time. They have a sense of what,
where the data's coming from. So the very important thing
I wanted to highlight here is that data metrics
are aggregated by week. So categorizing by week ending in a date. So they have some reference point before they go down to the scrolling, what is percent positive bin because that's not explicitly
obvious from the data. So I'm saying indicates
percentage of people that test for COVID-19 with macro tests who
tested positive, right? So the denominator is people who actually had
a molecular test for COVID and the numerator is who
tested positive in that case, if you get the percent, all data source from the New
York City Department of Health, the other thing I'm going
to do here is add in a note about MODZCTA. So If you wanted to as well here, you can add in another tag
link with the direct URL to that one, the one branch, excuse me, the one folder inside that repo that shows you that conversion table. You can do that if you want. Or you can just kind of say like, it's also in that data repo. So I'm just adding this
as again, more context. I think the more you
can explain the better you don't want to overwhelm people, but I think being as upfront
about what you're showing, is always important. Okay. And then the most
important part for the user is the actual choice. So we're adding in a
select input tag for this. I'm just doing a select
input where I'm making a list of choices based on every
possible week ending in the data frame. And so that is, I'm doing that by unique
and then week ending. So any unique week ending
date is propagated into a list and then the text prompt
in that select input is just gonna be selected
date week ending. And so that's what the user sees, and then organizing
the actual output side. What we're gonna do, you know, write the code for it below here is going to be a main panel. So this is again within
the sidebar layout. Now I'm working on the
main panel part of that. And then as you remember I had three tabs there that
you could select across. So this is going to be
our three leaflet maps. And so the way you do this is
you nest in a tab set panel. And so I have three tabs
here one being case rate, test rate, and the percent positive the most important thing in any shiny app are the other names you actually
give the different inputs and outputs. So always remember for
shiny app for the server is we're stitching together
the inputs and the outputs that's why the server always
has this format of function, input, output. And so we need to remember what we named different things in the UI. So for the select input it's date and we're going to talk about
where that shows up in output. And then for the actual plots, the names we're giving
them our cases, tests, percent positives. So this can be literally anything I'm giving it something informative. So I don't forget what
the inputs are being named as well as the outputs. But yeah, again, if you ever forget this is where you have to
reference for the names and those names have to match. If you want there to be
any kind of interaction between inputs and outputs in your app, jumping to the server now, again, we have the format
of function input output and I'm gonna open these up in a second but just so you have a sense of the general structure of this, right? We're defining the sort of
function and then to run the app you just run your UI and server. The first thing I'm doing here is actually I'm creating a reactive
part of the power of this is by using this reactive function making this reactive function I should say is that this is dependent on
the choice the user is making. So this calculation, this
function only happens when the user is actually making a selection. And so what I'm doing within this reactive is I'm writing this
function, just sit here. I am taking my entire data
frame and I'm selecting, I'm filtering that data frame down to a specific weekending which
was, you know, the goal here. And how am I defining that? I am saying input, which
tells the server, okay we're looking for something
that we named in the input and that name is date, right? And so that matches date. If I had more inputs, you know,
that's how you would class and call different different inputs. In this case, I just have one. So this is how I'm
selecting the specific date. I care about that date, right? The way we named this is,
it's a list of week endings. And so one of these week
endings is being selected and I'm telling this reactive function to filter my big data frame
just for that week ending. So I have that set of
polygons to play with, right? It's going to close this really quickly. Then we want to build our three tabs. So each tab is just gonna
have a leaflet map on it. Not really add anything
else you, you could, of course if you want it to, but for this I'm just making
a leaflet map for each. And this is going to look very familiar to what we just looked at for the the single leaflet
map we already made. So for cases, I have a pal color function. And so that is a museum
of color bin once again, in this case I am choosing
a yellow, green palette. Again, check out our ColorBrewer for a full list of kind of
like embedded color palettes. You can make your own if you wanted. There's a whole bunch of things you can do with color palettes
in R, in color in general. The thing here is that I'm
actually setting the domain for the color bin to be the entire range
of case rates, right? So no matter what week you jump to, it's one fixed color scale. That's really important
actually because you don't, the whole point of this app
is for people to compare the severity of case rate over time. So if you don't have a constant, you know, color scale
across all of your weeks that becomes a kind of very difficult at least visual process to you. The numbers might tell you a story, but you want the color and the color scale to
match that experience. And just like everything
else we are using, we're making those pop-up labels
like just as we did before. So MODZCTA and case rate
are going to show up there so we know where we are. But if you notice here I'm not taking the entire
data frame anymore. What I'm doing is I'm taking week_ZCTA so what was week_ZCTA again? This was this reactive function I defined. So no longer is this a data frame. And this is why I have this
extra set of brackets here because I'm taking a function actually and calling something from it. And the reason why it's a function is because this is happening only when someone is picking a week ending. So it looks a little
weird like it's not normal kind of data frame, column selection but this is, this will work. This is the kind of
inside shiny nitty gritty. Now we get into the actual leaflet map. This is almost identical
to what we had before. Big difference I'm not starting from my entire big all MODZCTA data frame. I am just starting from week_ZCTA. So this is again a function. So that's why we have
these closed brackets here starting from that, because
it has the data we needed it has the geography data,
it has all that stuff but now we're running it off of that. It's the filtered list
of polygons as it were, leaflet again, we're
adding our provider tiles. I'm actually doing one more thing here which is sort of optional, but it makes the work kind of
a little bit faster for shiny. So think of, these are all processes happening every time you
make a choice in the app. So what this is, is I'm actually setting the scope of the map from the outset before I even add any polygons what this is, is actually
coordinates for New York City. This takes a little bit of trial and error to get the right initial kind of viewpoint but we're taking coordinates
launched your latitude of New York City. You can Google this and look this up for wherever you're mapping
and also setting a zoom so that I'm focused sort of in this space. So you have a view of all the boroughs without like too much extra from, you know, surrounding
states and things like that. So that's just like a fun leaflet trick. And then we're adding in our polygons. And this is again, identical
as above as we've done before. Again, the only difference in this case is I'm saying week_ZCTA
instead of the full data frame, it's still a function. So we're using the closed
brackets and then adding a legend. The legend isn't tied to that reactive. It's tied to the broader data frame. That's why I just have, you know dash case rate here. This is how you kind of
modify your leaflet code to make it into something
that is interactive in a shiny app note also
to generate a leaflet map within shiny you have to run
this render leaflet function. And that is how you kind
of create this output exact same thing for tests
and percent positive. I'll just show you, I chose a purple blue scale in this case, and then present positive. I use the orange red color scheme there for the pallet spectrum. Also gonna harp again on
this idea of how we identify the specific output tabs we're doing. So where, you know, we have
to actually tell an interface and some of the UI. So again, much like we had
for the input date object we want to call specific output names. So just as I said before, we
have three leaflet outputs we coded for, they have
the names, cases, tests, and PCTPOS. So that's what they're called here. And this is how the UI and server then communicate with each other. So let's run a one more time. So now we have our updated text here where I have a little bit more rundown, you can click data repo
and they'll take you straight to the GitHub. And we have our three tabs,
which look really nice. So let's jump, let's jump
around a bit actually, let's go to end of November and just said that you
see the maps updated with the new data. All ready we're seeing
some darker colors on here suggesting a higher numbers for cases as well as test rate and percent positive. And then our most recent
data point is not great. Yeah, at this time cases
have been really taking off across the city and the state. And so this is reflected
actually by a couple of metrics. So for the case data, you can see that, the actual number of cases
in this specific week has increased quite a bit and
the number of tests as well, but it's not, you know, they're telling like somewhat different
stories as well, too, right? The interpretation changes somewhat. And so this is a good place, right? You have a first pass at a shiny app and you look at her like,
okay, so does this make sense? Like this is jive with
what I was experiencing with the data when I was
kind of inspecting it on the front end. And is this the sort of
story I want to tell. And at this point, this is kind of like, you know, imagine like
you're writing an essay and this is like the
first draft of an essay you go and then ask like all right, what are the interpretations from this? Like what can be improved
to better drive home a specific story? Can we add any more context? Annotations things like that, right? But bare bones, this is pretty good. First effort I think
there are there places where you can improve on this. And so you definitely should
try, for example, right? People might be like, oh, like why are the cases
and tests and present positive kind of like out of sync with one another in terms of maybe like people
are just looking at like, oh this is dark, but this one isn't dark. So like, why is that? And that's a complicated thing to answer. So that might actually require
another like text blurb or an explanation to see
what's been happening. If you notice discrepancies in the data this is also a cool
opportunity to like reach out beyond the GitHub or elsewhere to the New York City Department
of Health and ask them like, hey, like did you guys experience
some data recording issues reporting issues as well too? As I said before, the holidays
hit a lot of people hard in the sense of data
being lagging for COVID across all fronts pretty much. So this is a very cool way for people to experience
kind of COVID data, right? You have this level of interactivity. So people living in New York city, right. They might be like, well,
I live in this zip code. Like, let's see how that
looks for cases for tests, for percent positive, maybe how to interpret
different levels of cases as well too. So I'm just going to jump around a bit. Yeah, this is right
before holidays, right? So this might be a bit of
precarious time reporting. And so if you're wondering why maybe the scale looks the way it does when most of the data is
in this kind of lower zone, we had a couple of pretty intense weeks where certain zip codes and regions had very high number of
cases per a hundred thousand and easy way to improve this, right? Would be to change the bin structure. For example, if you
wanted to to bin that way to kind of cover the the variability in these lower levels
of cases, for example. This is a good starting off point for people getting into data
vision and visualizations. And I talked about a bunch of
different really cool tools that R offers, R and RStudio offers. So shiny leaflet, these are cool ways to create interactive data without going through kind
of more front end tools like HTML and CSS and yeah,
let's you really have fun with the data and there's
really a lot of possibilities you can do with this. You can make all kinds of shiny apps. You don't even have to make a shiny app. You can just play with a
single leaflet interactive map hope this was helpful and hope that you guys end up making some even better visualizations than this, showing all kinds of creative stuff. You can take this stuff to, to
not just apply to COVID data, of course you can take this and apply it to whatever
your favorite data set is, we are talking about COVID-19
the most important thing is transparency and
understanding of the data. And again, I will just say,
as I said in the other video that when putting out
data viz for COVID-19 just always ask yourself first
like what new information is the data viz showing that
hasn't already been shown? Does this serve a public need? Is this, will this be
wildly misinterpreted? Have I done everything
possible to make sure that enough context has been provided for this visualization? And the data is solid? The responsibility is
unfortunately too high for something like COVID viz to be kind of careless with that, right? So always keep that in mind. But again, the data is
out there to play with and practice on your own time always. So whether or not you decide to publish some of this data, I think, you know there's a lot you can do with the data even just to understand for yourself like how these visualizations are made and think about maybe the best ways you would wanna visualize it for yourself.