[MUSIC PLAYING] SEAN: Thank you all for taking
some time out of your day. We're going to start by looking
at the RMarkdown document. And how we can use R and
Python together inside of that document with
some special new magic that's available in the
latest versions of RStudio. We'll also talk about how
once we finish that document, it can be published and
shared in a reproducible way. So that others can view
it and perhaps, you might introduce some
lightweight automation. We'll look briefly at
how a similar outcome can be achieved with
cheaper notebooks using shared infrastructure. And then, we'll talk
about how the model we're creating in this
document can actually be put into place so
that decisions can be made based on that model. And we'll look at
how you can maybe do that with an
interactive app or there is something like a RESTful API. And then finally, we'll
talk about how those APIs and applications can
also be developed using some of the new options
inside of the RStudio toolset. So we'll go ahead
and jump in and set a little bit of context. I am using RStudio on a server. And so that means I'm
going to access RStudio through a web browser. And there's a couple of really
nice benefits to doing this. The first is that I have more
resources available to me than I would on my laptop. And specifically, if I go
in to start a new session, you'll see we're taking
advantage of a tool called Kubernetes that allows me
to demand to specify what resources I want for this work. Another benefit of
[INAUDIBLE] a server is that it means everyone
on my team, Lou and myself, are speaking from
a common playbook. And in this case,
we're explicitly defining our environment
through a Docker image where our session
is going to run. Even if you're not using
Kubernetes or Docker, by having your work
on a server, everyone has a common home,
which helps make collaboration a lot easier. And then finally, because
we're on a server, we're a lot closer
to our data, which means our work can scale
in an easier fashion. Now, in a moment I'll be
jumping into RStudio ID. But while I'm here I
just want to point out that one of the
things we've done in the last year for
multilingual teams is added the ability
for other editing tools to be accessed from
this same interface. So it's almost like
a Workbench, where you can pick up the right
tool from that Workbench regardless of what language
your current project demands. And that means for multilingual
data science teams, everyone has all
those benefits I just described that the
common playbook, regardless of what
editor they're using. And for IT, it's really nice
to have this common front door or these
different environments are going to be accessed. So we'll go ahead and
jump into RStudio. And at this point,
I want to point out that everything you're going
to see inside of RStudio IDE even though I'm using
the RStudio server is going to be available in
any flavor of the RStudio IDE. So whether you're using the
open-source desktop or RStudio Server open source or the
professional product like I am, you'll have access to
the interoperability that we're about to get through. So to set the stage
just a little bit more, I'm doing my work here inside
of an RMarkdown document. If you haven't seen
RMarkdown before, it's basically a format
where I can combine pros. So what I'm thinking along with
my code and then the output of my code. But what's new in the
latest version of RStudio is something that we're
calling the visual editor. And what that means is that even
though I'm editing an RMarkdown document, on the fly I'm
getting a live rendering of what this document looks like. And so that means I can do
things a little bit easier than I used to. So, for example, if I
needed to put in a table, I could insert a
table really quickly. I can add Markdown
to that table and it will be rendered on-demand. I can even do things like
insert citations or emojis. And so I can use all these
different tools inside of the visual editor. And whatever I'm doing
is going to show up as a real-time preview. And the reason why I
think this is so important is that regardless of what
language you're working from, ultimately, a data scientist's
key job is to communicate. And so being able
to write effectively in a tool that
supports really rich technical communication is vital
for any type of data scientist. And I found that
this visual editor makes things that
used to be challenging for that type of technical
communication really easy. One of my favorites
is I can go in here and actually insert an image. I don't have to worry
about figuring out where that image lives on disk. And I can even resize
it interactively. So we hope that regardless
of what language, this visual editor is going to
give you a head start as you're writing down what you're
thinking and documenting your process. And it really combines the best
worlds of a Jupyter Notebook [INAUDIBLE] IDE and RMarkdown. And if you haven't
seen this before, inside the latest version of
RStudio, all you need to do is click this button in the
upper right hand corner. And what that then does
is flip us back and forth between the plain
text source code that is still available here
perfect for version control and that pre-rendered
view of the document. All right. So that's where
we're doing our work. It is inside of
this visual editor. What are we going to do? We have a made-up
data science exercise. And so like most data
science exercises, we're going to start by
pulling in some data. And this data is
coming from a database. And again, something that
we're excited about regardless of what language you use is
the ability within RStudio to seamlessly manage
connections to databases. And so here I have a connection
to a database called content. There's a number of different
tables inside of it. And without switching to
a different sequel editor, I can really easily preview the
schema of this database table and even preview some of
the records if I want. So I have this database
and I'm starting out by writing some code
that's going to operate within the database to
do some basic transforms. And so if you haven't seen
this before, using dplyr, you can write R code. Then that R code actually
gets executed as a sequel. And so that's what's
going on here. So we're creating some
data that we're then going to do an analysis on. So let's take a quick
look at that data. Essentially, we have
spatial-temporal data. And so we're looking
at bike-share data from the D.C. area. So those bike shares,
if you've ever been walking around the city
and you want to rent a bike, you can usually find a
station with bikes lined up and then you pay
some money to be able to ride the bike around. And so the data that
we have is how many bikes were available
at any given station at any point in time. So we have this time series
of bikes that are available, then we also have a
spatial component of where those bikes were located. So it's a pretty cool data set. And what we want to
try to do is forecast into the future
for a given station how many bikes
might be available. And say, if you've
commute home at 5:00 PM, is there going to be a
bike there that I can take? And so we'll get started
by doing that in R. And the first thing
that we're going to do is create a test and
training data set. And because we're working
with time-series data, it's really important
that we don't accidentally use the future to
predict the past. And so, in R we've created,
a test and training data set, where the training data
set sequentially occurs before the test data set. And now, that we have
these two things, we can go ahead and
build our model. And I'm using a new R package
or an ecosystem of R packages called tidy models. There's a lot of
great resources online if you want to dive
into modeling an R. But essentially, what
this set of packages is allowing us to do is
pre-process our data. And so, we have here
latitude and longitude. That's the spatial element. We have the date and time. That's the temporal element. But we've used some
preprocessing tricks to create a factor for
the day of the week and also add some information
about whether a given day was a holiday. So you have a nice rich
data set to do modeling on. And then, the first
thing that we might do is create a model in R. And so to do that, again, within
this tidy model's ecosystem and essentially
creating a workflow. That's going to take our
model preprocessing along with this gradient
boosted engine and fit a simple
regression model. So we can go ahead and do that. And if we look at
the results here, we have a predicted
number of bikes along with the actual
number of bikes, kind of what we'd expect. And we can evaluate
how our prediction did. So one thing that we can do
for that quick evaluation is plot the prediction
versus the reality. And if our model is really
good, all of our data points would fall along
this y equals x line, where predictions match reality. You can see, we have quite
the variance from that line. We could look at the R
squared value of our model and see that it's not very good. That's why I'm paid to give
webinars and not fit models but this hopefully,
gives you a sense of what your workflow might
start to look like an R. We can even because this is
a tree-based model. Look at feature
importance, and so you can see that our
model is emphasizing the location of our
bike share station and not putting as much weight
on some of those factors that we create. So we have this iffy model in
R. What do we want to do next? Well, we could try a
lot of different things to improve our model. One thing we might want
to do is try something from the Python ecosystem. And so that's where the
heart of this webinar begins which is,
how are we going to interoperate Python and R
inside of the same context? Well, the first
thing that we'll do is create a Python code chunk. And inside of that
Python code chunk, one of the things that we're
going to do right after that is import some packages. So go ahead and add the code
to bring in these packages. And we can do a quick
sanity check here to run the code
chunk and it appears that we have these packages
loaded inside of Python. Now, I can already see in
the chat that many of you are asking how is
Python managed? Where are these
packages coming from? Those are really
great questions. So in the latest
version of RStudio, you can use a project
or global option to specify what Python
interpreter should be used in the
context where you're mixing R and Python together. And so for this project, I've
created a virtual environment and I have selected
and told RStudio to use that virtual environment. They can see that
RStudio is actually aware of the multitude
of Python installs that are available
on this server. And that's one of those
benefits of working on a server is that if Lou and I were
collaborating together we'd have this common
understanding of what Python installations are available. So that's where the Python
engine is coming from. The Python packages themselves
are coming from a tool called RStudio package manager. So in our case, our
server is not online. Here, we have some
sensitive data that lives in our environment. So we're not able to go
reach out to the internet. So instead package manager
acts as this intermediary, where we can install Python
packages from a specifically governed mirror of PyPI. And so that's what
you're looking at here-- I can search for packages,
I can see information about those packages,
how to install them, what they depend on. But something really
special about this mirror is that it has a
safety net that allows me to time travel as well. And so if I ever
got into a situation where my Python
environment wasn't working, I could go backward
or forward in time to reinstall packages from a
specific point in the past. So that's where these
packages are coming from. Let's go ahead and
start writing some code. And this is where I think
things get really magical. Because within this
Python code chunk, inside of the
RMarkdown document, I actually have
access to everything that I've done up
until this point in R. And you can actually
see the idea here is helping me autocomplete
some of the objects that are attributes of this special
R object that are available. And so what does that mean? Well, I'm going to
cheat a little bit and grab some code that
I've written ahead of time. We'll just put this
inside of our code chunk and then I can walk
you through it. So the first thing I'm
doing, because we're going to fit this
model in Python, is to load some
training and test data. But you can see that I'm
not starting from scratch. I'm using this magic R
object to actually pull in the training and test
data that I already created. And then, I can use that
as a jumping-off point for fitting my model. And so we're going to do a
little bit of pre-processing with pandas and then
we're going to use some functions from scikit-learn
to fit another type of model. And one of the things that
are really interesting and why we might
jump into Python here is that scikit-learning has
native support for time series cross-validation. So remember, I said we can't use
the future to predict the past. Scikit-learn knows how
to handle that even in the case where you're
doing a whole bunch of cross-validation thing. And so that's what we're doing
here to fit in SVR our model. I can go ahead and run
this Python code chunk and we can take a
look at the results. All right. So train results
that mean, that's the mean of our cross-validation
r-squared value? And you can see,
whoo, it's really bad. OK. So I'm not a great
Python model fitter. But hopefully, you get the idea
that we can really seamlessly reuse things back and forth. And you might have noticed
a couple of other things that the idea is
doing to help me with this Python development. So one of the things
that I briefly mentioned is that autocomplete. And so say we wanted to
use a different model from scikit-learn. You can see the
idea is providing me with all the different
options that are available, all the attributes and
methods of these objects, and the help for those methods
that arguments that they take are also all going to
be available in the IDE with the rich auto-completion
that you'd expect. The other thing
that will point out is if we look at
the environment pane when we switch to
the Python context, we actually got a Python
environment Explorer as well. And so these objects that I'm
creating inside of Python, you can actually see them and
explore them inside of the IDE and I could even
preview them as well. So this becomes
a really rich way to interactively do your
work and debug problems. And we can see that even things
like the filter capability inside of the IDE as well as
sort work for these Python objects. I can also switch at
any time back and forth between R and Python
because the two ecosystems are coexisting
together inside of this notebook. Speaking of which,
there's one last trick that I want to show you. So I fit this Python model and
I have some Python predictions. What if I then wanted to go
and do some further analysis? Well, you saw how I can reduce
R objects in the Python context, but I can do the same
thing inside of R. So inside of R there's this
magic object called py that gives me access to
everything that I've created so far in the Python space. And so I can go ahead
and use that object to do something like this. So I grab some code
from my cheat sheet, place it in this chunk. I'm essentially taking the
Python model predictions and plotting them with ggplot2. And you can see
a little bit here why our model fit is so bad. It's because we're not capturing
nearly any of the variance that is going on inside of our data. All right. So we've fit this model that
combines R and Python together inside of a notebook. What do we do now? Well, at RStudio,
we're big believers that you should share your data
science work early and often. This is the best way for
people with domain knowledge or stakeholders to validate
that you're on the right path. And so one way to do that
that's really powerful is through publishing. And within RStudio,
we can publish-- you see this blue icon here-- and we can publish to
a number of places. But within a team
enterprise setting, our recommendation is to
publish to RStudio Connect. And so I'll go
ahead and do that. You can see the ID
identifies my dependencies. And what happens
when I click Publish is that a reproducible
unit is created here that contains not
only my code, but also things like the version of
R, the version of Python, the different packages that
I need to reliably reproduce this document. And so I'll go and
show you the end result here on
RStudio Connect, which is that same rendered notebook. So we have kind of
all the information that I have been working up to
this point, but in the context where I can easily
share it with others. So I can specify who
should be able to see this. And because it's that
reproducible unit, at any point, I can reliably
refresh this document to regenerate the results,
or I could do something like set up a schedule. So my data might
change over time, but I want this notebook to
render on a regular basis. And I can set all of that up
because the dependencies are intertwined in a reproducible
unit with this document. Now you might be
thinking, that's great if you're RStudio user,
which probably most of you are. But what about my colleagues
on the Python side? Well, the same option is
available for sharing your work early and often if you're
coming from Python. And so I want to quickly show
you what that looks like. In the RStudio IDE we click
that blue Publish button. And inside of Jupyter
Notebooks, you can click that blue
Publish button as well. But I want to show you
a slightly different way that you can quickly and
easily share your content. And that's to import content
from a system like Git. So what I have here
is a Git repository. And that Git repository
contains all of the example code we've been talking about today. And it's a public
repository, so you can go ahead and play around
with this as well if you like. What I'm going to do is
take this Git repository and tell our RStudio Connect
to import that content. And so I have different branches
that I can choose between, and then this
repository has a number of different directories. I'll go ahead and import
the Jupyter Notebook. I'll give it a title here. And what happens when we
import this content is the same thing that we saw
an interactive publishing. The environment this content
depends upon is recreated. So I have a reproducible unit
of work around my notebook. If I go and open
up that notebook, I can see the results. So I can share this, even
with a non-technical user, who might be intimidated by the
standard Jupyter Notebook interface. Here, they just have a
really clean HTML document that they can read. And a data scientist
can specify who should be able to
see this content, but also do those
same things I was talking about for
scheduling or re-rendering the content on demand. So we have our notebooks. We've shared them
early and often. That's great. We've gotten this
domain feedback. And maybe we iterate
a couple of times and really improve that model. What do we do with that? How do we ensure that the
model is actually being used to generate decisions? That's kind of the key task
of the data science team. Well, there's two ways that we
think about this at RStudio. One is to influence how decision
makers are making decisions through your model. And a great way to
do that is through interactive applications. So many of you might be familiar
with tools like Shiny, which allow you to do
that in R. But we've worked hard in the last year to
ensure that multilingual teams have that same capability. So what you're looking at
here is a Dash application, which is an interactive
application written in Python. And what this application
is going to allow us to do is help stakeholders
understand our model. So they can come
in here and click through different
stations and see the forecast and the
location of the station. So it's a pretty simple app. But hopefully, it kind
of gets your wheels turning that even if you're
a Python data scientist or you work with
Python data scientists, they have the same ability
to impact decisions by creating interactive content. And we see examples of that
with Dash, with Streamlit, with Bokeh, with a wide
variety of interactive Python frameworks that are supported
through the RStudio stack. So that's how you might
go about influencing a person with your model. But what if you need
to use your model to make a whole bunch
of automated decisions, or if you need to influence
not a person, but a service? One way that's common
to solve that problem is by creating an API. And again, there's options for
doing this in R and Python. So on the R side, we
have tools like Plumber. On the Python side, we
have tools like Flask. And both of these can
be shared just as easily through RStudio Connect. And so I'll just quickly show
you what that API might entail. Again, we have all
of the same controls. And so we can look at the
logs of this Plumber API. We could do things like scale. If we know we're going to
have thousands of requests to this API at the
same time, we can specify how we want the system
to handle those requests. But at the end of the day,
the idea is pretty simple. It's that anyone can
come in and place a parameter of your model-- In this case, we're
specifying the station that we want to make
a prediction for and the time horizon that we
want to make a prediction for-- and then those inputs
are passed to our model and the results are returned. So here, we have the
results of our forecast. But as you can see, that
passing of inputs and outputs is done in a way that
machines can understand. So here, we have
the output in JSON. And just above that,
we have the request that our interactive exploration
of this API would generate. So other systems or services
or software engineers can take advantage of
your model at scale. All right. So we've covered
quite a bit of ground. We've talked about how to make
these models through notebooks. We've talked about
the different ways that you can
enhance those models and put them into production,
but didn't actually show you the code for either that
Dash application or this API. So that's the last thing
that I wanted to do is talk a little bit about
how you can get started writing in this type of code. And so if I go back
to the RStudio IDE, if you're an R user,
my recommendation is to just click
New File, and you'll see a bunch of
options, two of which are Shiny web applications
and Plumber APIs. So those are going to get
you started with creating either web apps or APIs. If you're a Python user,
inside of the RStudio stack we looked at
Jupyter Notebooks. But if you want to do
this type of coding for APIs or applications,
you're probably going to need a little
bit more robust editor. And you can use either
JupyterLab or the VS Code for that purpose. If I open up Visual Studio
code, the last thing I want to show you here is
just how easy that deployment of an API or an application is. So inside of the code
again, I'm working off of that shared common
Python environment. So it's easy for me to
collaborate and automatically get the right Python environment
as all my colleagues. I have my code here
for a Dash application. And then all I need
to do to deploy is use a utility that we've
created called RSConnect. So this is just a Python package
that you can install really, wherever you're
writing Python code. And it has commands to help
you then take that Python code and wrap it up in that
reproducible context. So for example, I'll do
RSConnect deploy Dash to a server called Dev. And it's going to run
through and identify all the dependencies
of this application and then give me the
link to the deployed app. And if we follow
that link, you'll see the exact same
bike share application we were looking at before. So to recap, we covered
quite a bit of ground. We created that document using
R in Python and some really cool RStudio magic. That document was shared
in a reproducible way. We talked about how
Jupyter Notebook users can do the same type of
early and often sharing, and then how we can use that
model to impact decisions, either through apps or APIs. And finally, how you might
go about writing those things using some of the new
features in RStudio Workbench. With that, I will
hand things over to Lou, who's going
to bring us home. And then we'll have the Q&A. LOU: Thank you, Sean. That was great. So, while Sean was talking,
I was taking a look a lot of the questions
coming in via Slido. There are a number
of questions there that have been
upvoted, some of which that Sean covered a lot of
that material in his demo after those questions came in. But we'll get to as
many of those as we can. So, Sean showed off a number
of different things here. For the data
scientist, he showed how you can use these two
languages closely together without a lot of overhead. So the data scientists
can use each language for their own strengths. Also illustrated some
of the different IDEs that can be used,
allowing data scientists to use their preferred IDE
again, making that easy. And we now support in
addition to the RStudio IDE of course, Jupyter and VS Code. Visual editing of R
Markdown is a great advance. Again, making the
user experience, the developer experience for
data scientists much easier. And to answer one of the
questions in the Q&A, that visual Markdown is
available in the open source version of the RStudio IDE. So a number of different
ways that data scientists can use R and Python to
deliver these wow results to the rest
of the organization. For the dev ops
and IT teams, using these centralized
environments makes it easy to support these common
tools for both R and Python, and to operationalize both
languages without doubling the work. And by making both of these
languages easy to use together, it helps data science
leaders really optimize their team
for the people, not for an arbitrary
choice of a single language to better enable
collaboration within the team and within their
stakeholders, and really able to access these wider
talent pools to hire new data scientists into their team. And finally, for the
business stakeholders in the organization,
ultimately, they don't care about what the
underlying language is. They just want to have
reproducible, accurate, understandable, data
science insights that they can use to help
make better decisions. So by sharing this data
science work through platforms like Connect, they can access
this up-to-date interactive analyses and dashboards, or
get the information directly in their email, so they can get
the answers they need in order to make better decisions. All of these capabilities are
supported by the RStudio team set of products,
which together combine to provide a single home for R
and Python data science teams. Again, RStudio Server Pro is
the centralized environment, allowing data scientists to
use R or Python to analyze data and create these data products. RStudio Connect is a platform
to publish the results to make them make
available to business users and other stakeholders,
using R or Python-based data science products. And RStudio Package Manager
to manage open source packages for both R and Python. One of the questions
that I saw in the Q&A was a question of
expressing pain around how difficult it
is to manage packages in the Python ecosystem. And we've recently added
support for managing packages from PyPI to RStudio
Package Manager to help address that
exact pain point. And I'll ask Sean to comment
on that in the Q&A section. RStudio is of course, used by
millions of people every week using open source software,
things like the IDE and the Tidyverse and Shiny,
critical source applications that we create as part of
our open source machine. But are also used by thousands
of active commercial software customers, including over
half of the Fortune 100 and many well-known brands
such as the ones we see here. We also have been
really gratified to hear from our customers
via TrustRadius.com. So if you are an
RStudio user, we encourage you to
go to Trust Radius, check out their RStudio profile,
read some of the reviews that people have left there,
and add your own review, because we read every
single one of these. We try and respond. And certainly, these
are one of the ways that we hear from our users. We've gotten great
feedback from our customers on combining R and Python
in a single platform, and how it helps them
collaborate among their team, and to essentially
allow these teams to, as the second reviewer says,
make use of their preferred language for data analysis
so that they can create and publish products via
RStudio Connect using both R and Python to share with
their internal clients and stakeholders, as the
third reviewer shows here. Now we've talked a lot along
the way about our pro products, but I want to
emphasize that our core mission is to engage and support
the R and Python community. And we do that in a
number of different ways. The most important of course,
is creating the open source software that our
users use every week. But there are a number of
different ways we do it. We support RStudio
Community site, allowing R users
and Python users to gather and ask
each other questions and get answers to
those questions. That's a great resource. We do our annual conference. This year it was a
virtual conference, under current circumstances. But that was just a
couple of weeks ago. And we were very gratified
by the engagement there. Check out our blog. And there's a link
there to all the global. All-- sorry. All the videos from
RStudio Global. Many different speakers
from all around the world. Those are all free to watch. We had a tremendous amount
of positive feedback, both directly on social
media on that conference. So I encourage you to
check out those videos. Our education team is
focused to help support the education of R.
We do a lot of train the trainer
capabilities, providing training materials, et cetera. So if you're interested in being
a certified RStudio R trainer, check out our Education page. We're also supporters
of the R Consortium, a multi-vendor group
to support and advance the infrastructure
around the R language, as well as a platinum
sponsor of Num Focus, which provides a tremendous support
for the Python ecosystem, among other projects. And then finally, Ursa Labs-- we've been a supporter of
Ursa Labs from the beginning. And Ursa Labs is
devoted to developing cross-language capabilities
such as using the Apache Arrow Project to provide
access, both within R and Python to
those capabilities. And it's important
to emphasize how our open source and pro
products tie together. Of course, it's
our core mission, as I said, to
contribute open source software to the community. And we spend over half of
our engineering resources creating this free and
open source software. As the data science community
adopts open source software, this drives adoption
within larger enterprises and commercial customers. These commercial customers in
turned by our pro products that are focused on helping
scale out in operation lines open source data science. And by buying our pro products,
that provides RStudio the funds so that we can sustain
our ongoing open source. We call this idea
the virtuous cycle, this idea that we're
supporting our mission to deliver free and open source
software to the community by selling the pro software to
the enterprise companies that need those features. So if you'd like some
more information, we have a number of
different resources. Again, these slides
and the recording will be sent out within a
few days after the webinar. RStudio.com/Python is your one
central portal to get to a lot of this information. We also did a blog post
recently on recapping all the Python
related features we added in both are
open source and pro products over the last year. So I encourage you to
take a look at this. If you'd like to set up a
time to talk to us one-on-one, you can use this URL here,
rstd.io/r_and_Python to learn more, set up a meeting,
get some answers. The webinar recording will
be available on our Resources site. If you'd like more
technical information-- and a number of the questions
and the Q&A were looking for more technical details-- check out some of these links
the reticulate package website, as well as providing some
examples and a webinar on that topic. We also, on our
solution.RStudio.com site, we've got a number of articles
providing deeper information on how to integrate Python in
RStudio Server Pro and RStudio Connect. Again, those are all accessible
through the top level RStudio.com/Python portal. And for our community
site is a great place to ask questions about R and
Python open source and pro. Now going through the Q&A,
the most popular question was how to, as an R user,
how can I learn Python? What would you recommend
to experienced R users? I did a quick poll to
our education team. And these couple of
books floated to the top. Python for Data Analysis
or Python Data Science, both of these books were
recommended by our education team as being more
data first, as opposed to programming first. I will try and get a
few more recommendations to add to the slide before we
share it with the participants. SEAN: And I would
add to that, Lou, as part of this
webinar afterwards, we'll be sharing information on
the RStudio community page that Rstud.io/RPyQA link, the link
that is currently bringing you to the Slido with
all the questions, will be redirected to
that community thread. And we would actually
love for all of you to give input to
answer that question as well, because there's a lot of
diversity in how people learn. We know these
communities are coming from very different people in
lots of diverse backgrounds. And so if you have something
that's worked really well, we'd recommend replying
to that community thread. We'd love to open source
the answer to that question beyond us at RStudio. LOU: That's a great point, Sean. Thank you very much. So with that Sean, is there
any particular questions that you'd like
to kick off with? SEAN: Yeah, absolutely. I think one of the most
common questions was, what parts of that demo are
available on the open source side? What parts are part of
the professional products? And so apologies if I
didn't do quite enough signposting there to delineate. Essentially, s everything that
you saw inside of RStudio IDE-- so that visual editor of R
Markdown documents, the ability to combine R and Python inside
of an R Markdown document, selecting what Python
interpreter to use, the Python objects in
the environment pane, the Python reppel
even, those are all going to be in that open
source desktop IDE, regardless of what version you use. And in fact, I would
encourage folks to look at the re-articulate
website that talks a bit more about some of the
options that I didn't dig into for combining R and
Python in that open source way. One of the questions asked, if
I just have a Python script, can I use that in RStudio? And the answer is absolutely. That's something
that I didn't show, but it is available in
the open source IDE. So I'd encourage
folks to go there. The things that were specific
to our professional products would be the selection
of different editors from that common workbench as
well as the deployment of work to RStudio Connect. I would encourage folks-- especially there
was some folks who were saying I work at
a research institution, or I'm teaching at
a university if I would benefit from some of
those professional capabilities. I would encourage
you to reach out. A lot of the
professional products we give away for free
if you're teaching, and are pretty discounted
for research as well. And so hopefully
that helps answer that kind of common
question about what can I do on my own today? Everything inside
of RStudio i.e. What would be part of the
professional products? That would be anything
you saw in terms of sharing or the
different editors within the RStudio workbench. LOU: Thanks, Sean. So one of the other really
popular questions-- and I alluded to it
earlier-- is this idea of the challenges of package
management in Python. And this is an area
I'm particularly excited about because of the
recent addition of PyPI support in RStudio Package
Manager, initially in beta. Would you like to
comment on that at all? SEAN: Yeah, absolutely. So I would tend to agree
it is a bit of a mess. That it's certainly is
kind of my experience. Right now, the
RStudio sits on top of the many Python management
tools that are available. And so if you already
have a tool of choice-- you saw in my demo, I was
using a virtual environment-- if you're using something
like QANDA or Poetry or PyEnv, the
RStudio it will sit on top of all of those options. And in fact, the kind of key
engine behind all of this is an open source package
called Reticulate. And within that package, you
can see the different functions that help the IDE and R identify
what Python environment to use. That's also something that
you'll see in the Options menu that I mentioned,
where if you go inside of RStudio IDE and
click Tools, Options, basically any of those Python
environments, whether they're from QANDA or Virtual Env
are going to be available. Now that doesn't
necessarily mean that the headaches go away. And so one of the things that
we're working on in the future is extending the support
for creating and managing those virtual environments
from within the IDE as well. And so you can look out
for that on the horizon. And then, as Lou mentioned,
on the professional side, if you think those headaches
are challenging as a single data scientist, as a team of Python
users, that can often present even more of a challenge, which
is one of the reasons why we're investing for those commercial
teams in the package management repository that supports
both R and Python to help make that work
reproducible, to help the IT folks say what
packages should be allowed, as well as that time
travel capability that I briefly showed. So that's I think, a
long-winded way of saying we all can commiserate with the
challenges in Python dependency management. We're going to continue to
invest and make it better. But all of the kind of work-- we're standing on the
shoulders of giants here-- is available to you
from within RStudio IDE. LOU: And I just want to add to
that another plug for RStudio's own Alex Gold, who's also part
of the Solution Engineering team, is going to be doing a
webinar in a couple of weeks on the challenges of
package management and how to address them. We're going to be doing a
series of blog posts between now and then, talking about the
package management problem. So I encourage anyone
who's interested in that to check it out. Now Sean, I got a
favorite next question. But anything that
you want to jump to before I toss that one out? SEAN: So there was one
question I really liked. It's a very RStudio
question, which is asking, what are the limitations? And we try to be pretty
upfront with that. So I'll just throw
out there I personally use the RStudio IDE
when I'm combining R and Python together. I have friends who
use the RStudio IDE for all their Python work. I have other colleagues who use
the VS Code for their Python work. So kind of our common
theme is the tools should be subservient to you
as a data scientist, and not the other way around. So pick what works for you. But some of the
specific limitations that you might run into that
we're investing to make better, but you might hit
today are that kind of creation of
Python environments-- I mentioned RStudio
sits on top of that, but it doesn't really give you
tools yet today for creating Condor virtual environments. And then the other
limitation I would call out is that the debugger
inside of RStudio today is still
pretty R-oriented. And so if you're
spending all day kind of writing a long
Python application and you are using the debugger
as a critical element of that, I would tend to recommend
something like VS Code as maybe a better option. So that's my favorite question
talking about limitations. What was your favorite question? LOU: Mine was a closely
related question, which was-- I can't find it
now on the list-- but there was something
to the effect of "can we now use
Python within the IDE without a lot of shenanigans?" So-- and my view on that
is the most recent release RCO 1.4 has lowered the bar
of necessary shenanigans considerably. You want to comment on that? SEAN: Yeah, I would agree. As I said, I do have some
colleagues, especially those who know R and
are learning Python for the first time, can be
really nice to not introduce yet another editor. It can be challenging
when you're trying to learn a new language
to also be learning a new tool. And so I think if you're
someone who knows R, that barrier of entry
is low enough now that you can get started using
Python right within the RStudio IDE. Then if you want to graduate
to another editor, that's fine. But at least you're learning
both those things at once. But I kind of echo that
as well to educators that are out there. Someone asked, what
language should I teach? Well, I think the
key is to teach what you're going
to be comfortable teaching to make sure that
your students have a really effective data science
experience from day one, that they're not stuck
fighting their tools before they're able to
create their first plot, whether that's a
matplotlib or a ggplot2. You want to give them that
gratifying moment early on. And it's my belief that
a lot of folks at RStudio have done a ton of awesome
work to ensure that the RStudio IDE isn't going to
present that hurdle and isn't going
to fight students. If you are a teacher interested
in using R and Python, also do a quick shout out
to our RStudio Cloud, which can reduce that hurdle even
further by allowing folks to start writing code without
installing anything on day one. LOU: And that's a good
segue to a question I wanted to answer about
RStudio Cloud, which is we had a question on whether
RStudio Server is running on premises or in
the Cloud or what? RStudio Server Pro-- typically,
most of our customers will install or C
server themselves, but where they install
it varies considerably. It could be on-prem
and often is. Or it could be in a virtual
private cloud on any of the major cloud providers. And so our customers do both. We also have
marketplace offerings for our RStudio Server Pro
on all three major clouds. So you can search for RStudio
in those cloud marketplaces. And that's a quick
way of spinning it up. And then of course,
RStudio.Cloud is, as Sean just mentioned,
is a way of getting started with similar functionality
without having to install anything and that
the host service purchases on a monthly basis. There was also a question on-- someone asking they have a
license for RStudio Server Pro. Is the launcher
available, or does that require the enterprise flavor? The launcher is actually-- Sean, let me-- I thought I knew
the answer to that. And I caught myself. Could you clarify? SEAN: Yeah, I would
say for those kind of specific questions,
or if you have questions about the professional
products, our sales team is happy to help. That sounds like a cheesy ad. But I can tell you firsthand
at RStudio, our folks that work with our
customers are all in their own right, really
talented data scientists. And they'll be able
to help you navigate some of these questions. And we'd be the first
to say you don't need a professional product. The open source stuff will work. Or we can help you solve some
of the challenges that come up. So I think that's kind of how I
would end things there, Lou, is that we'd love for you to
dive into the resources, dive into that community thread. And then if you are encountering
some of the challenges that we presented
throughout this webinar, feel free to reach out to us. And we are happy to help you
go through that path as well. [MUSIC PLAYING]