R Markdown Advanced Tips to Become a Better Data Scientist & RStudio Connect | With Tom Mock

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody welcome to the r studio live stream uh we're going to be covering a few different things today um namely we're going to be going over a lot of our mark down here from the from my home here in texas so really excited to be here with everybody today um i'm going to get some questions from the chat we'll do a little bit of live coding i've got a slide deck i hope you're gonna learn something with me today and we're gonna have fun doing this as an experiment um this is one of my first times live streaming and first time here on youtube live i've done a little bit on on twitch so excited to kind of learn along with all of y'all as we go through this experiment um let me know if you have any questions and we'll try to answer some of them in the chat and at the very end we'll also upload the video to youtube one of my colleagues is going to be answering questions in the chat as well and sharing some links as long as we go so let's get kicked off i'm going to start sharing my screen and we'll go from there so let's do this one all right so the first part we're going to go through i'm going to have a slide deck um you can find the slide deck if you want to um along with all the corresponding materials at bit.ly slash rmd marvel um i love this kind of captain marvel movies or kind of built the idea around some of that there's just some little images and kind of uh playing around with that idea so the concept for today is we're gonna go going higher further faster with marvelous r markdown our markdown is amazing it's one of my favorite r packages one of my favorite data products one of my favorite things for just doing things in r and other programming languages i'll also be working inside our studio primarily so you might see me log into our studio here i want to be very clear that everything i'm showing today up until the very very end of the presentation can be done in our studio desktop or our studio server or our studio workbench so there's really not anything here that's proprietary per se the vast majority of this is open source our markdown which is amazing you can be very productive in your school in your enterprise as your hobby without really having to go and buy anything you do want to go a bit further with our studio connect i'll talk about that at the very end but that's a small component today's presentation so let me go ahead and jump into the presentation uh the kind of idea here again like i mentioned with captain marvel is every heroin story has kind of three core acts and just like that we'll kind of go through some of that today act one is the story about our heroine is set up act two some type of complication arises that they have to face and act three is the heroine finds resolution defeats the bad guy you know when wins the day so for us we're gonna be talking about the tool here's our markdown here's some problems and here's some solutions you can do with them basically some workflows you can use to solve problems with our markdown the way i like to think about our markdown is that it's a lot more than just you know literate programming although that's a huge component our markdown as a taxonomy can do many different things it can be an environment where you're writing code and writing text and creating graphics and creating statistical models or data science or machine learning but it can also just be an end result as a data product you can create something like this slide deck i'm presenting which was made with our markdown you can use it as a control document to kind of control your code flow and how you organize your projects and bring different snippets of code together and lastly you can also use it for templating so basically leveling up your skills and taking one thing and turning it into a hundred or taking one example and adjusting it and recreating it without having to rewrite all the code so we'll be going into all four of those different taxonomies today the first one and i think the obviously one of the core workflows that you're going to do in our markdown is literate programming the goal here is that you're going to capture your code text or comments and your output in a single document so you have essentially like this reproducible workflow of capturing your code along with the output along with any kind of comments about what the code is doing or the outputs that you're seeing all together so you can share one document or one report with someone if you want to do that but you can also reproduce it because you have the source code the idea of literate programming has been around for a long time donald knuth has this nice quote about literate programming as a programming paradigm and the explanation of it is that it's logic in a natural language such as english interspersed with snippets of macros and traditional source code from which the compilable source code can be generated now when we think about source code in r we have code chunks so you have these little portions here where you might recognize some of these you know your favorite gg plot some d plier to filter a data set but then with our markdown you can also intersperse actual text you know write about it just as you would in a word word processor but now you get the benefit of doing code and text together the other component i just want to bring up here is that there is the yaml header which allows you to define certain metadata so up here at the top you can say i want my output to be an html document so this will when it's rendered when you finish it and you click render it will actually create an html object that you can open up in a web browser and you could switch this out to do schrenden or any other type of things or a presentation like i'm doing today now going a step further beyond just that plain text visual our markdown was introduced in rstudio 1.4 so this has been around for six or seven months but there's still lots of people who may not have heard about it so i wanted to cover it today you can essentially think of this as not only do you have the ability to look at the source code and the you know raw markdown you can also view it in a way where it shows you the actual rendered outputs so when you create a table it shows you a table when you include a graphic it shows you the graphic you don't just have to believe that okay this code should bring in this image we'll actually do that live inside our studio in addition to that it you know can do things with latex or with spell type real-time spell checking or all the different source code that you still want to be using and you get a nice header up here that allows you to add things like bold italics cross throughs bullet points tables all of those things with insertion but it still allows you to do that with code and it actually adds the code into it i'll be walking through this in a second here live so this may seem like oh this is really cool but we'll see it here in a second and then the last part is that our markdown in literature programming is not just for r you can do things like include python via reticulate you can use sql inside code chunks to actually write raw sql code you can include css or javascript for some of your front-end development work and you can include other things for bash rcpp stan and a lot of other different languages so altogether neter provides kind of interfaces to 52 possible languages so you can do a lot in our markdown even though a lot of what i'll be showing today will be focus on r so with that let's hop over into our studio i'm working inside the rstudio team evaluation again this is not necessary it's just a reproducible environment i can use so i'm actually logged into an rstudio workbench session this might look a bit different initially from what you're used to i'm using rstudio on say your desktop but the workflows are essentially identical once i log into our studio i can hop into there and open up some stuff i was working on over the past you know few days and week and we can go into some live coding of literate programming i'm going to be bouncing back and forth between the kind of source document editing that i'm doing here in the live coding and back to the presentation so stick with me ask questions if they come up and i look forward to kind of learning with you as we go along so there's a lot going on in here so i'm going to close down some some objects but in reality we're just inside the rstudio environment so again this will work just the same on our studio desktop as you would inside r studio workbench or server pro or server the first one i'll be doing is the visual editor so if you again want to go to bit.ly slash rmb marvel you can actually open up the same files i'm working on the first one i'm going to use is this visual rmd file so this file will look pretty similar to any our markdown document you've seen before there's a yaml header there's some code chunks there's a bunch of text there's a table here there's some images all sorts of things going on if i want to switch over into visual editor mode i can click on the visual editor button right here so this button will switch me to the visual markdown editor once i click that it'll reload the screen and it will then convert what i'm seeing on the page into the actual represented output so now rather than you know three pound signs and intro it actually creates a level three header all of my links get turned into hyperlinks my code chunks get a gray background and i can still you know execute them i can still run tidy verse and load the palmer penguins library but i don't have kind of all the raw text around it it will load tables and i can adjust the size of those tables so we can resize this table real quick we can move this over so now we have a nice table so that large represented table that i have here with embedded hyperlinks is now a visual table and it's got some included graphics basically it's just walking you through how to use visual or markdown and there's a lot of cool things you can do with visual or markdown what might be more useful is like jumping into an actual code example as opposed to kind of a demo of what visual markdown does so if we go to the penguins folder we've got a couple different documents in here i'm going to start with the ping and report detail and again i'm back into the plain text version so you can see it still has loaded some of the graphics there's some initial exploration here what i want to do is walk through this as a kind of literate program example so we're going to move into visual markdown mode so again i'll click the visual markdown button and now we have the formatted text along with my code chunks we're gonna use a couple libraries so i'm gonna load uh tidyverse palmer penguins broom skimar and the bootstrap lib package i've got a couple different documents here talking about what literate programming is some initial exploration that we've used for you how to use skimmer some examples of inline code chunks and we're going to do is do a quick skim of these data so if you're not familiar with the plumber penguins data set it's basically a relatively small data set about 344 rows and it includes three different species of penguins for both male and female this is a nice alternative to say the iris dataset so you can use it to replace some of that in your workflows and skimar is just showing us what are the different variables in this so there's a lot of information about the bill length basically how long their beak is as well as the build depth how how broad it is and then their flipper length or how long their arms are their body weight their body mass in the year that this was observed all the data was collected in 2008 and uh big thanks to allison horson and allison hill for putting together this data set into a nice art package with a wonderful website so now with basically you know three or four lines of code is able to get all this metadata about the uh about this data set so there's there's a little bit of missing data the average grouped by species and build length and all the different variables the standard deviation and a histogram of what that data set looks like now you might be saying why didn't i just use dplyr and you can still do that so i'm inside in the visual editor i can say okay we're also going to look at some specific statistics we'll split the groups by species and look at which ones are the lightest and which ones have the shortest slippers oh looks like i made a mistake and uh our studio is catching that i misspelled lightest so i will correct that thank you for the warning there and now i'll run this code chunk and we'll now get our d plier output so we've grouped by species and sex we've summarized got the weight of the animals the flipper length of the animals and we've arranged by descending weight so what we observe here is that the adelaide penguins look to be the lightest although there is some n a's in the sex so some of the missing data there is in the sex column and the males are about equivalent to the chinstrap males but the females are much lighter than the rest of the other ones okay so it looks like the adoleigh or the lightest penguin i can continue going into my code and i can filter out the is in a's and i can find some that actually have this if i want to make it where there aren't any n a's in the sex i can also just change that to a kind of eliminate the exclamation point here does the opposite so it does finds no nas in the data set so we've only got about 11 rows of the 340-ish that are missing data but let's go ahead and make sure that we're not missing any animals that are missing body weight so we'll find the smaller data set so i'll kind of go through this and really the idea here is that we've got a lot of text we've got inner snippets of code we've got graphics that are going to show up in line and we can create even fancier graphics or split it out by specific portions nice graphics here and then we can find a small linear model between the body mass and the flipper length there's a very strong linear relationship between how big the animal is and how long its flippers are this might seem silly we're kind of sticking to a fun data set because i don't want us to worry too much about what the data is as opposed to the techniques we're using really what we've done here for a few minutes is just walk through a live coding example and it's really just me running code but you can see that you know when if i gave this to a colleague or i was looking at this this is nice to read through like this reads almost like a report already even though i'm able to edit it and kind of change things as i go so you get this experience of kind of a word editor or a word processor with the ability to run that code and that's really what the beauty of literate programming is is mixing in the comments or the things you're writing along with the code you're writing the other part i want to show if we go back to the visual editor is we can also do this same workflow with reticulate i can actually use some python so i'm going to again switch to visual editor mode and just show you that it does still work for for python and nr so we'll load uh these libraries i've loaded reticulate to allow me to use python on the back end and then i can import pandas and uh get the penguins data set into the pandas object now if i look at the environment i have the ability to switch between r and python and see those different objects i'm in the python environment now and i can see the data set that i brought in so you can see the same one you'll notice that python refers to nas or missing data is nan versus r is going to refer to those as n a's so small difference but it does translate back and forth nicely so let's close that down and then uh since i kind of showed you the workflow of doing this live like let's just knit the document so we'll take the uh reticulate document i have that has a little bit of python it does some some data cleaning uh in pandas as opposed to dplyr and we can show the the final output here in the browser so this is very important penguin analysis that we're doing here so we'll again we're pulling in pandas instead of skim r we're using the describe function from pandas and we're looking at a few different things by the species this is done in python and then we'll do some group by summaries with pandas again so again grouping by species and sex getting some outputs and then showing the data set when we drop some nas and filter to just species equal to ally we can then take that python object and pipe it back into ggplot i'm much more comfortable in ggplot than i am in matplotlib or other libraries in python so i just moved it back in but really just the idea of like if you know a little bit of r know a little bit of python you can work with them in the same document and kind of use that for learning or use it for some of the reporting or other kind of scientific computing that you're doing cool so we spent quite a bit of time on literate programming we're going to go back into the presentation and talk about our next taxonomy also take a breath and look at some of the questions that are coming in um has question number one was has anyone found a good method of real-time collaboration on our markdown in a similar way as google docs that's a great question um obviously you could do something that's more asynchronous by checking into something like version control so just as i was saying with using the code rmd marvel if you go to say like github you can literally use the things that i'm working on and edit those so i was just in the visual editor and then the one i was working on was the reticulate dash dock so now you could add some edits to this you could pull it in and work on it that would be asynchronous an alternative would be that rstudio workbench actually allows you to do live editing the same document with project sharing so that is a professional product that we offer um but that is an option if you're worried about like collaborating at work and needing to work remotely but still shared documents you could also just have a shared drive like if you're talking about google you can upload documents edit them and move them back and forth i still think that version control was probably the best idea for just like being able to comment on something make edits and then try and talk through that process um one other thing how do you have that many pains let's talk about go back into this for a second um so our studio 1.4 introduced not only the visual editor which again is this button that allows you to show the kind of rendered output and also add a support for the multi-multi-panes so if i go to tools and go to global.are global options pane layout i can add new source columns so you can see i have my four typical panes but then i have an additional source column and if i apply this and click ok go back in here and drag this over i now have the ability to work on a single document in the middle and also have a more basic report off to the side so i can edit like two files at the same time you're not editing the same file but two different files i use this quite a bit for doing things like shiny where you might be writing essentially like the front end portion the user interface and the backend server component in separate files and bringing them both in together so that's a good option but uh let's hop back into the presentation real quick so we talked about literary programming for a while uh the next kind of taxonomy of using our markdown is the idea of a data product and here we're using our markdown to in using r to generate a final output for consumption so yes you know you want to be reproducible yes you want to include code but here you're mostly concerned with i want to create something that i'm going to give to someone to help them make a decision convince them or just report on something like here here's the daily report or here's a machine learning pipeline i ran with tidy models and fit all these different parameters did hyper perimeter tuning and here's the final model and it's fit but with all of the code available there's a ton of different data products you can create with our markdown you can create reports um so static html reports like i was just showing when we rendered that document pdfs you can save out so they could you know share them via email or just have something you can have offline rich text formats github documents or even word documents um presentations just like this one is this one was created in uh schrengen so that is a our markdown package using reveal.js or remark.js i'm sorry you can create powerpoint presentations you can create other kind of more native reveal.js presentations beamer for latex you can create full-blown dashboards with flex dashboard that can be either static or you can integrate shiny and make them interactive entire websites so some of you all may actually have our markdown blogs i personally use distill for my personal blog and blog down is a great option for a lot of power and uses hugo on the back end to convert markdown and our mark down into complex sophisticated websites you can write books beautiful books that you know people have literally published you think of like r for data science the book by hadley wickham and garrett grolman that was written with book down and you can actually look at it online in the book down format and lastly one of the things i'm really excited about for our markdown and data products are html widgets you may have been exposed to dt or the javascript data table library or reactable which is similar and creates interactive tables in r that use javascript on the back end so without even having to use shiny or server you can have the ability to sort rearrange or change the data you're seeing on screen all at once there's plot leaf or graphics crosstalk for interaction we'll dive into a little bit of that so we're going to hop back into the environment here so we'll go back into my penguin project i'm again going to push this extra pane over because i want to zoom in a little bit more on this one and we're going to go into penguin project and look at data products so the first one we're going to be looking at is i was talking about different output formats um so distill being the ability to like create a web page in uh in our so let's close down some of these we'll save that clean up my environment a little bit i got a lot going on here y'all so close this one close the readme close down some whiskers the multiplied we've got a lot of exciting things today and i don't want to get to them too fast so we're going to close down all of these and then i'm going to pull this one over this has my penguin distill which again is penguin dash distill and o2 data products this may look almost the same as what you were looking at before with your visual or markdown it's essentially the same report or similar but when i knit it or render it that's when it gets turned into the new format so with the visual markdown editor you still get all this nice formatting but when i actually click knit and actually ask r in our markdown to to render it into the new format it will render it out as something else so in this case because i have it set as distill so you can see output distilled dash disarm sorry distill colon colon distill article it's going to create an actual distill article instead of a basic html doc so here's penguin's distilled this looks quite a bit different i love that it has this floating table of contents that has some information about when it was published who the author is it's more of like scientific writing native to web so this is an html document but it is really like how to do scientific writing and again you still get all the niceties and kind of the core our markdown work so this is still the same code i was using but i was able to generate a different output um simply by changing the r mark down the ammo so still a lot of the great things but it's got really really nice like uh headers where i can like grab this you know specific link to the modeling section and share that link or i can look at my different models that i've chosen at the bottom and where there are squares at the very bottom here i can do the wrap up and it gives me this nice table of contents where i can jump back and forth between everything so i really like the still that's a good option if you're trying to create a um a personal website or some scientific articles so let's get back into here i'm going to click on this drag this one all the way over and the next one we're going to do is a shirengan presentation so again the data product here is um more of a presentation so just like the presentation i'm showing you this one was created with our markdown i literally wrote a bunch of our markdown code to generate the presentation that we're going through this looks pretty similar to what i was doing before in terms of it's got really the same code chunks the specialness here is that instead to kind of tell this presentation format this is a new slide i use three dashes to say go to the next slide go to the next slide for each of these different sections so each one is a slide break but i can still use all of the essentially the same code i was using now this really makes more sense when again i render it out so if i knit this document or render the r markdown into a shrinking presentation i can then open that document and give me one second to pull it over and then now we have that same kind of report i was talking about as a live presentation in a few seconds so here's the information we're looking at is here's some penguins we're looking at their bill length their build depth their body mass their flipper length all these fun little data points we're talking about literary programming here are the specific statistics i'm interested in so grouping the data by species and sex and getting their body weight and their flipper length i can clean up the data there was some n a's and we dropped all those with naomi and now let's get into the plotting section so we're going to show a quick distribution of the data we'll have the example of the live code up here and then we'll show a linear model and a scatter plot this is again the same gdp plot that i had before i didn't have to adjust anything and then we'll move on to the modeling section so now we have all of our models i'm fitting i've got a list objects of some different formulas and then we're going to summarize those into different models broom to clean them up and now i have a nice table with the gt package of what are my different model formulas i was using so looking at flipper length by body mass the r squared the p values the sigma the statistics about these so obviously i'm flying through this portion but the idea here is that um with literally like no change to the underlying code i'm able to get all of the different things into this object so like get them into a presentation um yeah there's one more example i'm going to go through about data products and then we're going to talk a little bit more in the presentation so number one a flex dashboard is a way to create dashboards in r what i can do is actually run this document i'm running the document because it has a shiny run time shiny is not required for flex dashboard but does give it a lot of extra power in terms of now you can have a server component on the back end and execute something so the basic idea of this um flex dashboard is it separates different components into portions so i have kind of a tab here for miles per gallon weight a tab here for miles per gallon cylinders and then a table about them what shiny is doing is giving me linked brushing or the ability to select specific objects or observations and then show something different so by using this as a control environment i'm able to just limit it to specific miles per gallon a specific weight combination or even just one data point or i can get a whole section and zoom in on those and then adjust both the the table and the graphics here this is done with shiny and it's taken literally verbatim from the flex dashboard site so you can actually recreate this or the source code that they share is embedded inside of it the shiny runtime might you know that's very exciting but that does require you to manage a uh shiny server or rstudio connect on the backend so an alternative because you're working on our markdown is something like crosstalk and what crosstalk does is gives you the ability to cross communicate or cross talk between different html widgets an html widget might be a plotly graphic or a dt data table or a reactable table or of all sorts of different html widgets the idea is that those are using javascript they're writing via r and they can still communicate with each other so there was a question in the chat does knit mean render essentially an inter-based function what knit or render means is take the document run it from the top to the bottom in a linear fashion and create the output format there's all sorts of side effects that are good in terms of you could write out different files the videos render is like run the entire document and render the output or knit the document and render the output those are interchangeable you can use the rstudio mip button or if you're doing it from the r console you could do our markdown uh render and just pass the file name to it so it's not required to use inside our studio but we do provide a net button for it so with that let's let's knit the document real quick and we will uh wait for it to render so it's going to go through and reading the lines and pulling in the code and it's going to generate an output for us to walk through while that's rendering i'm going to look at some of the questions in the chat um there's a great question about jobs the jobs pain that i want to talk to in a second um okay and this one was output to reactable so let's do that in a web browser that's not the one that's exciting though uh let's do crosstalk html so the output that we created was a crosstalk document again this might look like the exact same plot we've been creating for a little bit while or for a little while flipper length and body mass so there's a linear relationship between that but the beauty here is that with crosstalk i have an interactive graphic in terms of um this here all these little points i'm hovering over the hover text is comes in for free like you you get the hover text and all these different documents with the table you have the ability to go paginated or sort by species sort by sex sort by any column so you have interactive components interactive graphics interactive tables all of this available for free and all of us able to be used within your enterprise within your hobby within your workspace what crosstalk is doing is giving you some of these filters so this may look exactly like shiny but there is no shiny server on the back end this is all done in the browser on your local machine so i can filter this data set to only show female penguins so now the only sex in the table is female and all the observations we're seeing up here are female only i can show just the males and do the same thing i can show the males and the females together great i can filter the body mass to show only the lightest penguins and we can kind of see the drop off of the gentoo as we get to about 4 000 grams there's not really any gen 2 penguins that we observe at that body weight so again all of this is happening where i'm changing things here it's affecting the plotly graphic it's affecting the table and they're communicating together to give you this output um and again the beauty of our markdown and the beauty of crosstalk and this javascript libraries is you don't have to manage a server you could literally publish this to github or netlify or our studio connect or just take the file and literally like deliver it to someone else by like email if you wanted to or by shared drive they could open it and get this interactivity that you're expecting so all this amazing things you can do with crosstalk and r markdown you can still leverage shiny on top of that if you wanted to to get even more power or to actually execute new r code you are limited to what these libraries can do in javascript but if you add shiny then anything you can do in r can be added into the object as well so back into here there was a great question in the chat about the jobs panel and part of the idea is that could you execute our markdown renders as a background job the jobs panel might be new to a lot of folks this is again available in the rstudio desktop the local ide it's a free thing it's just available you could render anything or markdown if you wanted to as a background job you would just have an r script that literally contained like our markdown render and then whatever the markdown document you wanted to do so you could parallelize some work if you wanted to like you know render this one i know it's going to take a while also render this one and then that leaves you free to continue using the r console to do other things so these jobs are still executed in your local environment but behind the scenes so they don't you know use the same r uh source that you're using you do them as a background or an alternative um or our environment um so yeah so the jobs panel is amazing that was also introduced relatively recently you can use it what i use it up for a lot is for long-running computation like parameter tuning or grid search where you're going through a lot of different things all at once it could take 30 minutes or 45 minutes or even longer in some cases i don't want to lose everything like lose the ability to do my console for 30 minutes so when i was watching sliced i was really hoping a lot of people are going to be using background jobs so they can continue like doing graphics in the r console while the code is fitting in the back end that would be a cool example um let's hop back into the presentation we talked about quite a few of these different data products we've got about 25 30 minutes left so i want to make sure that we get through everything um for the table column width um the basic idea for the visual editor is when you you know bring in a table or create a table it has default widths those will adjust to the size of the contents there are some cases where if you have like really really long content it can create a table that you need to adjust and you can just drag it over with your mouse to adjust the size of the column for shrinking and all the colors and layout styles i highly recommend the shrendigan femur package by derrick aidan bui that is an amazing package that allows you to write full themes for shrinking and you can get you know beautiful colors and specific um you know fonts or css embedded in there as well to make it custom for gt summary tables and our markdown um i use gt all the time inside our markdown i don't think there's any known issues with gt summary um but the idea would be is that if you use gt summary within our markdown you can just render it out and get the gt summary in there as well a great question here are all these features are showing in our studio free yes everything i'm talking about right now is free these are open source packages some of them are community based some of them are studio based this is the beauty of open source and our community and the javascript community and the python community is that all these different components can be connected together kind of you know revamped or integrated and you get this amazing power within free and open source tools um last question i'll answer here is you could use different yaml so you could uh like render one as a shrinking one as a distill the one difference is that shrinking has a very different internal syntax those three dashes are going to create new slides in schrengen but they will create a horizontal line and distiller on markdown so while you could have one document literally generate two or five different types of output you may have to change some of the skeleton of it not the code but the skeleton of like how it presents and the last one here when you do render when the document is a flex dashboard you just render it just as you would with anything else the difference is if you are using shiny you're actually going to run the document it essentially treats it as pre-rendered so it renders the flex dashboard then brings in the shiny component if you don't have shiny there and you're just doing static or um html widgets like crosstalk and flex dashboard then you can just render it like anything else so we've talked quite a bit about all that thank you for kind of sticking with me through a quick run through a bunch of data products the next taxonomy i'd like to cover is the idea of a control document so this is kind of a step beyond the classical ones the first two people have probably done a lot of that control documents are some that i had not done as much of before i'd done some research so the goal here is to modularize your data science tasks and use our markdown to control your code flow that seems kind of clinical in terms of like code flow data science tasks like what are we talking about here part of what we're talking about here is you know taking something and generating something new with it so parameters in our markdown can be added to the yaml header with the params colon and then whatever you want the object to be named so in this case like species equal to adoles when i render this document it will run the code in here but then it will pull in whatever the parameter is so without changing the body code i can change what the actual report does so i can say for the adelaide penguins generate this uh you know these different filters in this different graphic and i can just change this one line of code in terms of this parameter and it will generate an entirely new report or entirely new document that is specific to that you can have multiple parameters i'm just showing the bare bones example of one parameter with one component you could have five or six different parameters or more and you could have multiple different things within that parameter so you could have like a string or a vector inside there and kind of run through all the different ones the idea is that you're just without having to change all the code and rewrite 10 different reports you could generate 10 reports just by changing the parameter 10 times so very powerful you can also reference our files so for a lot of folks you're like ah you know like our markdown it feels like a lot of you know there's a lot of different power there but you know i have to write all these different things i'm just going to stop in a dot r file um you can use and reference external.r files a couple of different ways so you can always just source an internal r file and just read in things from an external file but the nidar read chunk option kind of gives you the best of both worlds here you might see that i have code chunks that are named in terms of this is smaller penguins and this is plot penguins but none of there's no code in there there's nothing in here and that's because with read chunk i'm actually able to read in from a dot r file specific chunks so this colon call or sorry pound sign pound sign and four dashes essentially creates a chunk label so even though i'm inside dot r files our markdown can understand this descriptor and say okay everything inside here is the smaller penguins chunk everything inside here is the plot dash penguins chunk so i can actually write things in a dot our file reference them in multiple different dot r markdowns you can imagine i have a sharingan i have pdf i have word doc and html all of those could reference dot our files so that you're just writing the components in our markdown that are specific to our markdown like how to create the presentation and you can source some of the other more complex code you're writing an r from the daughter of file just an option not mandatory but definitely possible the other idea is child documents so i could have a core kind of parent document here that is just an html document it's got a couple library calls in terms of loading tidy version palmer penguins and then just like before with the read chunk i have a blank uh blank chunk here there's no code in it but it's saying for this chunk read in the child report adelaide report dot our markdown and that will take everything in this so the you know smaller this code chunk this text this code chunk and embed it into this portion so essentially you wrote all this and put it into the parent document again the idea of like you're able to build up different components and reuse parts from different documents into kind of a parent or a core document that you want to reference that in so again if you're playing around with different output formats you could always just you know write a document and change the output format but sometimes you may want to actually leverage a specific report that gets brought in this becomes even more powerful if you think of like conditionally adding a report so for my example it's the same report that we had before so we're bringing in this child report the parent report now has a conditional statement essentially if this object is chinstrap or if it's adoleigh then bring in this report so that will basically say as long as this means chen strap or this means adoleigh then render the adalat report and bring it in the better example and the kind of more complex example is if a criteria is matched so if uh values are above or below a certain value that is concerning then bring in all this other code run it and generate the kind of worrisome report there that you want to catch up to like um the you know the data is way off from what the norm is oh my gosh what's going on add all this different thing to check the data quality that you don't necessarily aren't doing all the time but you're doing it for here that being said please check your data and check data quality but just as an example you can do these kind of if else or conditional statements inside the child document another idea of like parent and child documents is blastula blastula can be used with things like gmail or you can use it with our studio connect to send emails the core idea and there's a bit more code here we're loading some libraries we're creating some graphics and then we have this render email portion it will basically take a blastula written email so you can see that this penguin email the output is blastula blaster email and render that as an html email and then send it the example i'm using here is rendering it and sending it via our studio connect you can also render it via smtp which is a protocol that like gmail uses so again connect is not required to do this it's a nice integration but you can do this for free with your smtp style servers these emails can intrude arbitrary r output like ggplot or even you know tables or images of tables you can basically include a lot of different things inside an email that you're sending out to your colleagues or to yourself i use these a lot for do this thing and then tell me what the output is in a summary so we covered those and most of those kind of in my mind the code is covered here in terms of all we're doing is pulling in code and rendering it so i'm not going to go through the live examples i did include an entire section inside the penguin project so control documents has blaster emails it has emailers programs external files using child using conditional child you're welcome to walk through those again at bitly r d marvel but because it's just rendering the same document over and over i'm going to spend the next 10 or 15 minutes talking about our next topic which is going to be templating i really love the idea of templating as well so for templating a lot of people think of like taking one thing and making it other things or taking one thing and changing something about it to make it different so here captain marvel is working with her friend's child and changing the look of her outfit so that's really cool and you could change you know what an our markdown appearance looks like with a different theme or different css but more of what we're talking about is don't repeat yourself generate input templates or output documents from code so if you think about parameters that's something we used before was you know defining a parameter setting it as a species equal to adoleigh and then we generate this report obviously you could go into the r markdown delete this type out you know chin strap or gen 2 or one of the other species and click render or you could do it from the command line or the r console so with our markdown render you can take this report so the penguin.org markdown and define the parameter as something else so here a list of species equals the gen2 instead of adoleigh and now this will generate that report with code with that new parameter so now you've generated two reports with the same report essentially so two different outputs from the same input so that's very powerful a more interesting example in my mind is multi-renders where you actually take all the different possibilities of the parameter and generate all of them so here i'm wrapping our markdown render into my own render function so that i pass in penguin my render function it takes the same report it replaces the species with whatever the ping when i put in and then it creates an output file with the name of that penguin dash report html so now i can find all the distinct penguin species i can pull that column out and then i can render that with per lock so basically goes through each of the different possibilities and generates a new report for all of those so now i have an analog report a chin strap and a gen 2 report all from one set of starting code you can imagine that this could be something like i'm changing the year or changing the month or changing the day to generate this report data or for like a modeling task i'm changing hyper parameters or the different modeling function or a different data set and generating an entire report on it so you can do a lot of different things with parameters and looping through different parameters now the other ideas rather than generating 10 reports out you might want to generate multiple things in one report so here i'm doing something similar i'm uh but all within one report so i'm sourcing an external uh file this is multi-plot which is an r function that basically outputs markdown text it outputs a plot and prints it and then i add some blank spacing and then i have a loop here so this is a code chunk called loop output i'm using results as is so it just pastes the literal output here of what this r code generates so it actually output essentially text i'll take the penguins data set find the distinct species pull the species and then walk through again with per um this function three times what that does is generates three different versions of the same text in the same graphic but nuanced or kind of specific to whatever the input was whatever the penguin that we're generating was so let's uh walk through that one real quick so we're in templating i'm going to show a couple things so penguin params i'm going to pull this over again the species is adoleigh so i can knit this document pardon me let's let that document render or knit and then i'm going to pull it into the browser so this is a pretty bare bones report but you can see 193 are classified as adelaide the distribution of adelaide penguins are below and there's a big chunk of them that are out of line we go in here and change this to chin strap chin strap i can again knit that report and get the essentially new one out so if i refresh this it should be chin strap i think let's see yep there's chin strap chin strap and now we have 276 penguins instead of 344. so i can do it that way i can do it with knit with parameters so if you just click up here on knit and go down to knit with parameters it will pull up like a kind of shiny component where i can change this and do gen 2 so now we can knit that to gen 2. so it's going to do the same thing and render a new report for me with gen 2. so here's the report we're looking at uh pull it over there so that's possible um a couple different options you can knit and just change it you can knit with parameters but the more powerful part that i like is the the multi-plot or render all so render all is the idea of like rather than manually changing things or going through all the different options i can again use my render function if i go up here and get all those my render function here we go so this will render the specific documents but i want to render all of them together again this is the same thing i showed in the presentation of here's the templates the penguin params and then we're going to show the species equal chin strap that's fine with the penguin names we're going to pull those out what this basically gives us is the three different penguin names and then we're going to walk through those with the uh per walk render function to just prove to you that that actually happened i got some code here to show you that this was run at uh 352 whatever the time is on the server on august 26th and it happened about a second later was when they were actually generated so these reports were generated just now and it gave us the out of my report the chin strap report in the gym tube report so that's super powerful and i think that's a really really cool idea the loop within a dock are markdown is also really really fun so here this is the one that's using the um [Music] reports up here in terms of it's going to walk through that multi-plot that i've already defined and it's going to create three different outputs in the same document so let's knit that and see what it looks like when we render this it's actually reporting out as is and then it's going to generate this loop within the doc html let's take a look at it so for adelaide it gives you a level three header and it gives you um these observations so here's 151 observations of adelaide penguins and the flipper length is 190. for gen 2 it's got a different graphic and different numbers and then for chin strap a different graphic and different numbers so from one dot our file you're able to generate multiple outputs within a single document so again a slightly different idea the last idea i'll talk about is whisker versus glue so uh the way i like to think about this is one versus two if you're thinking about these uh these parameters on the outside of these brackets glue is logic templating in terms of you can take a string put anything you want with valid r code inside these brackets and it will interpret that and execute it inside that context so very very cool very strong package whisker is logic list templating meaning you have to define the object at a time and then pull it in it also uses two brackets instead of one the reason why it's logic list is that it's really thinking of it just as text as opposed to code up here so if you try to do the same number of row of empty cars it will just print a blank there because it can't interpret that so it just returns blank um where whisker is very powerful is the idea of generating actual r d or dot md files like basically generating new input files or text files from an input file that kind of blew my mind the first time i thought about it so let's walk through an example real quick if i have this report so this is an actual our markdown document but you can see i've got these double brackets in a few different places so the name of the penguin the name of the species and some long pros here now you could think well i could just replace this with params but the beauty is is when you use whisker to fill in this document which you see on the right so adelaide penguin adoly and then this long prose about them it will actually um allow you to still work with the document it's still a raw dot rmd file so that's really powerful and you can continue working with it so this is what that function looks like that i was using use penguin template which reads in this file fills it with whisker basically like replaces all of the different observations with double brackets with species equal adelaide and this long prose and generates that output so let's show that really quick um whisker versus glue uh sorry let's do [Music] my whiskers use penguin template all right so here we're going to define that function species long pros and we're going to use this use penguin template we've now filled the blank template with this really long text string and a bunch of other stuff and now we can actually navigate and edit that file inside our studio so now we have an our markdown document that i can continue adding to here is more text but i've programmatically filled it with different parameters of interest so there's certain times where this is amazingly powerful there's certain times where you should probably just use params but the idea of like if you want to be able to continue editing it like if you're generating a shrinking document and putting all this code and filling some parameters but then continue editing the slides this could be a really powerful option um so we've talked about all the different things you can do inside templating the last component here and we've only got a couple minutes i think that was part of the intention is that all the things i've showed up to this point have literally been all the amazing things you can do with our markdown for free and within your own kind of choices this last component is a bit about our markdown within our studio connect so our studio connect is a professional product from our studio um it's a hosting and execution platform for shiny our markdown plumber as well as things like jupiter flask dash and streamlight for our python folks you can use it to execute or schedule our markdown for basically anything so not only can you host things there but you can actually have connect re-render or re-execute this code in the future so you could provide parameterize our markdown where someone can actually go through change parameters and re-render it without having to know anything about r you could schedule etl jobs and pull from sql with dbplier apis with hater or spark with sparkly r and then bring those into r and do other things with them connect provides reports that actually have logging and history so you can have multiple versions of the same report over time you can schedule some of your long-running training steps that take three hours and have them run at three in the morning so that when you come in in the morning at eight or when you log in your computer all that's been done overnight and you've got your batch scoring done and ready for you in the morning and then with blastula that we mentioned earlier you can actually create and send emails conditionally or on a schedule so here we've got it scheduled for 9 36 a.m central standard time i want this report to run every day and publish the output and send an email we'll just for one quick second i'll show you what this looks like actually scheduled this report to run every hour for the past few hours so you can see i've got a couple emails here it's been sending me so we've got an actual blastula email that has an attachment of a pdf a csv file and a powerpoint file all this was generated with our markdown it gave me an embedded ggplot an embedded ggplot in an embedded table and this was run with our studio connect over and over and over uh with different data and sending me these different parameters so this is the power of like all these things you learn are amazingly powerful in your local machine if you want to go a step further and have like authentication for these documents for sharing within the enterprise or scheduling or some of these other things that are that are worth paying for then you can look in rstudio connect and do some of these amazing things that being said you could also send emails like this from gmail or from an sntp server and you don't need our studio connect but what rcu connect is providing is the scheduling the authentication the logging the history some of those things that a lot of our enterprise customers are concerned about so that was a whirlwind tour we got it done in exactly 60 minutes in this fun experiment we did together so thank you for tagging along with me listening to me talking my uh my 200 worth per minute voice thank you to allison hill for defining that um a bunch of different follow-up links there's some articles that people in the community have written about how to use our markdown to be very powerful in your enterprise so emily reuterer charlotte gelfand have really great articles on this i have one covering the same idea of like a taxonomy of our markdown there's two entire textbooks about our markdown the cookbook for different examples that may answer some of the questions that folks had in the chat that i didn't get to and then definitive guidebook which shows all the possibilities of what you can do with our markdown we also have a few different webinars that people in the community again have showed like here's how we're using our markdown or other things within our enterprise or within our specific domain so maybe you come from the insurance or finance industry this report talks about rethinking reporting with automation nice video from the community there using r to enhance clinical reporting within the life sciences industry so maybe you come from uh life sciences or healthcare this has some clinical data and clinical applications and one of my favorites that's a bit more general is avoid dashboard fatigue a lot of that is talking about how to go beyond just creating these kind of dead dashboards and doing things like scheduling doing things like creating rich outputs in different ways to engage with your users i know that rob or whoever is working with me in the chat is also going to be dropping some links to some follow-up material so he will be dropping off a kind of summary of everything that talks about some of the packages that has links out to these other things i talked about and if you want to look at the code i used and you know riff off of it or adapt off it or just play with it by yourself you can go to my github at github.com j thomas mock slash penguin dash project or bit dot l y slash rmd marvel that's the short code and i will drop that in the chat again that will take you again to the same page i'm on here with all the code oh and i misspelled it apologies bit.ly so let's try that again see if we can get it rolling there we go so my apologies correct link there and you can have fun with it and again i know that we covered a lot here i try to be a bit more motivational with some of these things in terms like here's the things you can do and then giving you examples to build off of i find the way that i learn the best is from these projects or doing project-based learning where i can take something that i know is working in a minimal way and expand it to be useful for me and my job uh do something in a different way that is useful or a new way that i wasn't able to do it before so go out have fun with our markdown you have all this amazing power available for free that you can riff off of and change uh so go out there use it and have a good time i'll stick around for a little bit and try to answer some more questions as they come in but that is the end of the core presentation itself and uh thank you for tagging along we are doing this as an experiment so if you could take some time and actually go through and fill out the google form that was just dropped in the chat uh please do that that will help us understand like what other types of things y'all would like to see in the future where we can make improvements where i kind of slipped up or where kind of things could be better uh so thank you all for attending thank you for your time today and uh b everything was fantastic i think so thanks for thanks for your time i'll try and answer a few of these different questions um oh also for anyone that's still here this is howard this is the bonus howard slide um it's almost the end of summer here in texas not really but uh this is my dog howard who loves watermelon so that's that's something to take home and run with and have a good time with um let's do some of these questions i think i answered the one about unmarked on render and flex dashboard but again if you're using a static flex dashboard you can just mark down render it just as you would like if you have a shiny component you will then run the document as a shiny app because you have to have that server component for parameters how can we get a list of options with an api endpoint that's really interesting i don't i assume that you could pull in something if you had like a an api and you were pulling in data with say like the httr package then you could pull that in i'd have to think about using api as a way to generate parameters and i don't want to kind of speak off topic so i'll try and do something in a follow-up blog post for that and we'll release that is there a good way to output a specified set of dashboard views to a pdf so you can use the webshot package that is i didn't have enough time to talk about that today but the webshot package allows you to take screenshots of html and then embed them somewhere else so you can imagine that html is amazing html is what i think you should be creating because you can do so many different things with it and make it interactive but sometimes people are like i need a pdf or i don't have internet access i need to have this or i want to print it out a couple different use cases with a pdf you could actually do web shop and print different html views like a flex dashboard to a pdf there is a blog post that we have in our studio our studio blog webshot that covers that and we can find the link and webshot two so this package uh for printing html this is a really powerful package for doing all sorts of great things like basically just taking screenshots so here's a screenshot of the um crayon and the r statistical project home page uh from our 4.1 and the base r pipe so pretty exciting stuff and just being able to take images of that so you can basically pass any html into that it doesn't have to be a web page it could be locally generated html as well uh can you ask knitter to read several dot our files absolutely so you could either source or knit or read chunk from multiple times so you can imagine i have 10 different dot r files you could read all those into a parent or markdown document and execute the code there what are the best practices to produce word documents from our markdown let's go back here so for word documents from our markdown there's kind of two ways of doing it you can do it with our markdown proper there's also the what's called the office first that is a really powerful environment for just working with office things you can imagine powerpoint uh excel and word like the kind of microsoft office stack you can do a lot of those different things with r and with our markdown um there are certain limitations or certain things you have to work harder to do but i think that you can still generate you know useful things in our in from our markdown into word to do those when you think someone should use our scripts over our markdown files um i try and almost always use our markdown files if i'm doing anything that is useful for that so anything where i'm creating an output or i'm trying to like capture my train of thought i know there are times where folks are like well you know i have to send a dot r file out to a high performance computing cluster um that's fine like you can use rmr and a dot r file and you can still do useful uh workflows where if i'm in a dot r file i can create like a section label or something so with command shift r um i can do you know cool label and this will put a separator there where i'm i'm doing labels or i can do pound pound four dashes to also create a section label what this does is not only running comments about what you're doing but if you do want to bring these into our markdown in the future now you can use code chunk and read code chunk to pull those in usefully into our markdown file that being said i find that working our markdown especially with the visual editor so the ability to not only just you know see the raw plain text or work with the plain text but also have the rendered output in the ability to like interject all sorts of different things that can like insert a level one heading like the amazing things you can do with our studio and our markdown are worth it although dot our files can do the exact same with uh r functions this blastula run aside a docker container on a virtual linux server does it have limitations on the size of the email attachment so blastull like anything else could run inside a docker environment um the way that docker works or sorry the way that blastula works is either you locally generate an html file and send it via smtp so i actually like render an actual email and then send it via gmail for example or i use rstudio connect to render it on a server somewhere rstudio connect doesn't require kubernetes or doesn't require docker um it just runs on a traditional uh bare metal server like on premise or a virtual cloud environment so you don't need docker in that workflow if you wanted to use docker with flashlight you could but there's no requirement to do that limitations on the size of the email i do think that the size of the email is more limited by your server client for sending emails or your web client i know that for example i think like gmail has a size limit on images so if you try and send a gigabyte of data via email gmail's just not gonna let you do that that's not a blastula thing that's just a gmail thing um but you can attach basically anything you can create an r you can attach to an email so pdfs excel files csvs rds files images entire reports or html files those can all be attached things that are in line in blastula and i think we actually have a control document i can show this we're way over but we're just going to have fun with that let's look at a blastula email that's not the one i wanted uh penguin emailer there we go so when i actually render the email this is a preview that you can run with the package this is not been sent but it shows you what the email would look like as a preview so this is an embedded ggplot it's literally inline so in your actual email that you send to someone it'll have this text and it will have this image um however some clients are very aggressive with stripping html so if you tried to say do a table with gt that might not render exactly as you want not because gt's do anything wrong but just because your email client can't parse html that you know complexity so what you can do instead is attach an image uh directly and that will typically work in any of the clients the one addendum there though is that attaching uh tables as images does remove all the metadata of a table and essentially makes it a an image like a graphic so it might be better to attach a report and just say like here are some quick summaries there um but for a lot of email clients you can still embed tables within them all right i think that's actually all of it um we still have around 400 people here so thank you for spending some time with me and going a bit over um have a wonderful weekend be safe um you know enjoy our markdown have a good time and we will see you next time tom logging off closing down a lot of things and have a good one
Info
Channel: RStudio
Views: 15,073
Rating: 4.9891305 out of 5
Keywords: rstudio, data science, machine learning, python, stats, tidyverse, data visualization, data viz, ggplot, technology, coding, connect, server pro, shiny, rmarkdown, package manager, CRAN, interoperability, serious data science, dplyr, forcats, ggplot2, tibble, readr, stringr, tidyr, purrr, github, data wrangling, tidy data, odbc, rayshader, plumber, blogdown, gt, lazy evaluation, tidymodels, statistics, debugging, programming education, rstats, open source, OSS, reticulate
Id: WkF7nqEYF1E
Channel Id: undefined
Length: 73min 15sec (4395 seconds)
Published: Thu Aug 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.