>> Hey, data scientists, are you tired of having to open a new browser to go and
analyze it and run your data? Don't you just wish you could
stay inside Visual Studio Code forever and continue using
your same data there? Well, rejoice because now you
can with Jupyter Notebooks, and you'll find out more on this
episode of Visual Studio Toolbox. [MUSIC] >> Hey, everyone. Welcome to another episode of
Visual Studio Toolbox. I'm your host, Leslie Richardson, and joining me today to talk about
Visual Studio Code instead of good old VS space is Claudia Regio from the Python
team. Welcome, Claudia. >> Hi. Thanks for having me. >> What exactly is
a Jupyter Notebook? I mean, it sounds cool because
Jupyter and space and stuff. Great name. But can you
tell us more about it? >> Yeah. Sure. A Jupyter Notebook
is an interactive document, and it mixes executable code, visualizations, equations,
and mark down in text. I can even go ahead and show you
how to open one up in VS Code. >> Sweet. Why should somebody
use Jupyter Notebook? >> Jupyter Notebook has become the de facto tool
for data scientists. It's the ability to
write some code, run it, see your result immediately, tweak that code again, and see the results immediately. It's this ability to
have this playground of visualizations that makes it really conducive for data
scientists and their work. >> Cool. I like that
word playground too. It just implies experimentation
is encouraged and stuff. >> Definitely with this tool. Let me go ahead and show you how
we can open one up in VS Code. We can bring up the command
palette in the VS Code by pressing "Control Shift P" or "Command
Shift P" if you're on a Mac, and you can just type in
create new blank notebook, that will be the first one up here. When I select this, you're
going to get a drop-down of some options if you're working
with different types of notebooks. I'm going to go ahead and
select the Jupyter Notebook. >> Now, is all those capabilities
in place in VS Code by default, or is there anything that you have to install or use on top of that? >> No. Basically, everything is
going to come with the extension. That's where we differ a
little bit from JupyterLab. For example, when you get
started with that tool, you have to go and download all of the extensions that you would
like to use within that tool, whereas we're trying to
make it such that all of those really cool extensions
and great features, they all live within this one tool. Once you have the Jupyter extension, you should have all the
goodies you want and need. >> Sweet. >> This is a Jupyter Notebook. Let me show you around the
components really quick. In here, we have our input. This is where I might write
something like x equals four, and then I'm going to go ahead
and press this "Run" button. This is going to tell
me x equals four. I can then add another
code cell using little buttons right here
that hover between two cells, and then I can create something like, go ahead and print x for me. As you can see, I'm also
getting some docstrings telling me here what parameters are accepted within these functions, which
is really, really nice. When you're working, I
don't have to go to Google, I don't have to start checking, "What is this function
here," which is really nice. As you can see here, I
have this output of four. But if I go back and
change this value to 10, you'll see that my output is also going to change in response to that. >> Cool. That's super easy
from the looks of things. You don't even have to
have a main function or whatever the
[inaudible] in Python is. I'm not super [inaudible] in python. >> Definitely. >> That's really cool. Really,
it's like a playground. >> Yeah. As you can see, you
can imagine adding more cells, taking some cells away, having visualizations, tweaking them, you can get the idea of how this
component all works together. Let me go ahead and
show you a little bit more about when we have this
very blank notebook here. But for the sake of time, I've gone ahead and ran some
of these ahead of time. But as you can see here, I have the very well-loved
known Titanic dataset. This is a very popular one
amongst data scientists. >> Sweet. >> Yes, everybody's going to
know this when they see it. >> For those who are
unfamiliar like myself, is it the like the ship Titanic
or what kind of data is it? >> Yeah. This is all Titanic data, it gives you things such as, actually, I'm not going to tell
you, I'm going to show you. >> Okay. Cool. >> I'm not going to tell
you. If you're new to this, I'll show you how you
can find it yourself. >> Sweet. >> Right here in the top right, what I just clicked on was
the Variable explorer. That's actually going to give
you a representation of all of your variables that you
create within your notebook. As you can imagine, like I mentioned, the playground you're
probably making x, x2, x_test is what I see here. It can get really
easy to lose track of those variables or what they
are, and what status they're in. What we created with
the Variable explorer, you can keep track of them, it gives you a preview of the type, the size, and a quick value here. >> That's awesome. It doesn't matter the scope that these
variables are in there, just all present and in one spot? >> Correct. Yeah. It's per
notebook, not per cell. There's no like I need to
be within scope to see it. We're going to store
them all for you. >> Got you. That's pretty nifty. >> As you asked, let
me to show you here the Titanic data frame. Here we go. If we want to take a
deeper look at this one, for example, I can go ahead and
hover over this left icon here. This will open up the data viewer. What [inaudible] actually is, it's an Excel-like
representation of your data, so this works with tabular data. Essentially, what you can
see here is this is pclass, this is sex, age, fare. This is a Titanic way. This
is telling us [inaudible] >> It's like a manifest, right? >> Exactly. >> Cool. >> The age, how much they
paid for their ticket, what port they embarked
on, unfortunately, did they survive or
did they not survive? >> You've got a hunt
for zero dollars. Is that Leo? Because
he gambles his way on? >> Probably. Leo gets
everything for free. >> Yeah. >> Essentially, what
this data frame is the one of the most common
tasks that data scientists get started with is the ability
to predict based on inputs whether somebody on the Titanic survived or unfortunately
did not survive. >> Okay. That's cool. It's an interesting first task. >> But the cool thing
about the data viewer is it gives you this ability to
do some really quick checks. For example, to make sure
that my data is clean because most data scientists
know you do not get clean data, you get data, you got to clean it. You are in charge of that,
you have to make sure it's right before you start
working with any models. Something I might check
really quick is to see, are any my ages negative? We know that shouldn't exist. As you can see, I've put in the
filter here for less than zero, no values, no row entries, which means good to go. >> Sweet. So you can put in complex conditional
statements in there, and I'll check just
[inaudible] , right? >> Yeah. We're working with more
complex stuff, but right now, we have the most simple ones
for age greater than five, things like this equal to. We're also going to
be adding not equal to support [inaudible] as well. Just quick checks to make sure
that your data is all in order. >> That's pretty nifty.
What kind of tools are data scientists using when
they're not using Jupyter Notebook? Because just based off the
stuff you've shown so far, it seems like it would be an extremely useful
and more legible tool to be able to see all the
contents that you need to know. >> They're really using Notebooks. That's the reason why they've
become the de facto tool. Things like regular Python
scripts while they can be great, they just don't give you
that flexibility and that ability to iterate if we continuously process through
and change check which is really, really common in data science. >> That's really cool.
Also, looking at the main page that you were showing with all the data and
the different sections, it looked like Wikipedia
article almost. I like the legibility of
that playground and stuff. I'm sure that's useful for a lot of other people
like your team members and other fellow data scientists
you want to share the information with too, right? >> Yeah. Definitely.
Otherwise, things like that that you can
do in the DataViewer, you would have to be
writing code for, you'd have to do writing
like select this column, check if it's giving the count
for anything less than zero. You have to write code
to check those things. The DataViewer just makes it a lot
easier to give you a shortlist. So at least, if you're writing
code to fix your data, clean it up, you're not wasting your time for a bunch of other columns that don't need it right now. >> Got you. That's pretty cool. I can
also see this being used as an instructional tool almost. Just looking at the different
segments of code snippets, can you write dialogue
that explains what each snippet does and is
going to return data wise, and then the student
can play around with it and walk through essentially
a full-blown tutorial? >> Absolutely, yeah. There's a lot of educators who are actually
moving towards this tool, especially for data science if you're an Educator for data science. I can show you an example
really quick as well. Even when I took my Data
science certificate, a lot of teachers, they
give you some notebooks, you have some markdown, and you might say something like, ''In this section, we will go over
data imports and basic stats.'' Let's do that. I can run that, I can render it, and then
you can see here that, as an Educator, hey, this
is where we're going to be discussing this type of
code or things like that. >> Nice. That seems so simple too. But that's really cool. It makes me want to
get into data science. >> Quick and easy. >> Yeah, very quick and easy. Are there any limitations that Jupyter Notebooks currently has that your team is
working to improve on? >> Any limitations that we're
working on improving on. I wouldn't say
necessarily limitations, at least off the top of my head, maybe if you give me a
day, I'll give you a list. No, not at this time. >> Cool. >> Because there aren't
any, I want to be clear. Just because [inaudible]. >> No worries. At first glance, it looks like there's
so many powerful things that you can do with that. >> Yeah. >> With that then, what's
next for Jupyter Notebooks? >> What's next for Jupyter Notebooks. I can actually talk
about one feature that we just enabled on the VS Code side, and people have been requesting
this for a really long time, so I would love to mention it. We have support for Multi select, which we haven't had before
in our last release. Let me show you an example
of what that looks like. If I scroll down here, I can select this cell. This is just one selection, you can tell by the
border on that cell, but if I want to select another cell, I can hit "Control" and
click this next cell. That'll select both of these for me. I can do "Control" and
click another one, or even just the traditional "Shift", which I have this selected, I click "Shift" and
I select four cells. Then you can [inaudible] actions
with those which used to be a huge nuisance because
we couldn't group cells. >> Sweet. That's some good
stuff. What else you got? >> Now, so one of the features
that's coming, it's on its way, actually piggybacks
off of Multi select, is what we call Smart select. Let's go to the bottom
of this notebook where I have a couple of models. I've decided I actually want to
move forward with this model here. This is a Naive Bayes Classifier, it was the most accurate one,
I want to go with this one. I can actually go ahead
and click "Smart select". It's going to be surfaced
within this cell toolbar here. Not quite yet, but
you guys are getting a sneak preview of where it will be. If I were to click "Smart
select" here for this cell, I would start to see that highlighted
background on this notebook, and it's going to select
all the cells that I need that are required to generate
that cell and that cell's output. Once I have those actually selected, users will be able to
take other actions. You can run those cells, you can export them, you could even merge them, you could turn them into a function. It's basically going to
give you a shortcut to your code that you may not
necessarily need all of it. Again, coming back to that whole, the notebook is a playground. By the time we're done with it, there's probably like five
different models in there, we decide to go with one, this is going to help with
that cleanup process for you. >> Awesome. It's the little things. >> It's the little things. It behaves similarly to a feature
that we already have today that actually cleans up
the line level as opposed to the cell level
that's called Gather, and that's an experimental
extension that we have so that's something you have to install in the VS Code marketplace. But if I were to hit "Gather", which you actually see here
today on this particular cell, it's going to generate
another notebook, and it's going to take just
the lines of code that are necessary to create that last
cell that I gathered on. >> Cool. Sweet. >> Also, it helps me
clean the process. >> Awesome stuff. To wrap things up, you also mentioned I think
something about Live Share, is that going to be in the near future with
Jupyter Notebooks somehow? >> Yes. We already do have some preliminary
support for Live Share. You can see your
counterpart's notebook, you'll be able to add and run cells, and they're just working on
finalizing the other goodies, but there's already
some support for it, but we plan on that being a full, out of the box, supportive
experience for users. So collaboration will be a
lot easier in Notebooks. >> Awesome. Nice to know
that data scientists aren't being left out of the
collaboration experience. >> Absolutely not. >> Sweet. To wrap things up, where do people get started
or how does one get started? What do I need to
download or install? >> Great. Thanks for asking
my most important part. This specific experience
you're seeing today, you'll have to install
VS Code Insiders. You can just Google VS Code Insiders, and just download
the Python extension if you're working with Python. All the stuff I've gone through
today is available for Python. If you're not working with Python, you can just download
the Jupyter extension, but all of the goodies right now
are for the Python extension. >> Good stuff. Well,
I hope everyone's really excited to
try those tools out. Anything else to close this out? >> No. Just give it a try and if anybody has feedback for
me, they can reach out. Hopefully I can put my e-mail up somewhere and somebody
can give me feedback. We're welcome to
listening to everything you guys have to say and make it better for you, so
please get in touch. >> Awesome. Also, I do have one
more question because VS Code is has such a wide community
in Open source and everything. Is there a way to make your
notebooks go public so that other people can see
each other's notebooks? >> Yeah. Right now, a lot of people are
using GitHub for that. They're creating repos.
As you can imagine, we have our Open Source repo. People can actually contribute to the project if they want as well, always give that plug if
people want to contribute, but we recommend just using GitHub. There's a pretty wide community
there of people sharing notebooks and goals with
notebooks that they have. >> Exciting. Cool. Well,
thank you so much for sharing all that really cool
information about the beautiful playground that
is Jupyter Notebooks, Claudia. >> Thanks. Appreciate.
Thanks for having me. >> Yeah, thanks for being here. Until next time, happy coding. [MUSIC]