Getting Started with Jupyter Notebooks in VS Code

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
>> Hey, data scientists, are you tired of having to open a new browser to go and analyze it and run your data? Don't you just wish you could stay inside Visual Studio Code forever and continue using your same data there? Well, rejoice because now you can with Jupyter Notebooks, and you'll find out more on this episode of Visual Studio Toolbox. [MUSIC] >> Hey, everyone. Welcome to another episode of Visual Studio Toolbox. I'm your host, Leslie Richardson, and joining me today to talk about Visual Studio Code instead of good old VS space is Claudia Regio from the Python team. Welcome, Claudia. >> Hi. Thanks for having me. >> What exactly is a Jupyter Notebook? I mean, it sounds cool because Jupyter and space and stuff. Great name. But can you tell us more about it? >> Yeah. Sure. A Jupyter Notebook is an interactive document, and it mixes executable code, visualizations, equations, and mark down in text. I can even go ahead and show you how to open one up in VS Code. >> Sweet. Why should somebody use Jupyter Notebook? >> Jupyter Notebook has become the de facto tool for data scientists. It's the ability to write some code, run it, see your result immediately, tweak that code again, and see the results immediately. It's this ability to have this playground of visualizations that makes it really conducive for data scientists and their work. >> Cool. I like that word playground too. It just implies experimentation is encouraged and stuff. >> Definitely with this tool. Let me go ahead and show you how we can open one up in VS Code. We can bring up the command palette in the VS Code by pressing "Control Shift P" or "Command Shift P" if you're on a Mac, and you can just type in create new blank notebook, that will be the first one up here. When I select this, you're going to get a drop-down of some options if you're working with different types of notebooks. I'm going to go ahead and select the Jupyter Notebook. >> Now, is all those capabilities in place in VS Code by default, or is there anything that you have to install or use on top of that? >> No. Basically, everything is going to come with the extension. That's where we differ a little bit from JupyterLab. For example, when you get started with that tool, you have to go and download all of the extensions that you would like to use within that tool, whereas we're trying to make it such that all of those really cool extensions and great features, they all live within this one tool. Once you have the Jupyter extension, you should have all the goodies you want and need. >> Sweet. >> This is a Jupyter Notebook. Let me show you around the components really quick. In here, we have our input. This is where I might write something like x equals four, and then I'm going to go ahead and press this "Run" button. This is going to tell me x equals four. I can then add another code cell using little buttons right here that hover between two cells, and then I can create something like, go ahead and print x for me. As you can see, I'm also getting some docstrings telling me here what parameters are accepted within these functions, which is really, really nice. When you're working, I don't have to go to Google, I don't have to start checking, "What is this function here," which is really nice. As you can see here, I have this output of four. But if I go back and change this value to 10, you'll see that my output is also going to change in response to that. >> Cool. That's super easy from the looks of things. You don't even have to have a main function or whatever the [inaudible] in Python is. I'm not super [inaudible] in python. >> Definitely. >> That's really cool. Really, it's like a playground. >> Yeah. As you can see, you can imagine adding more cells, taking some cells away, having visualizations, tweaking them, you can get the idea of how this component all works together. Let me go ahead and show you a little bit more about when we have this very blank notebook here. But for the sake of time, I've gone ahead and ran some of these ahead of time. But as you can see here, I have the very well-loved known Titanic dataset. This is a very popular one amongst data scientists. >> Sweet. >> Yes, everybody's going to know this when they see it. >> For those who are unfamiliar like myself, is it the like the ship Titanic or what kind of data is it? >> Yeah. This is all Titanic data, it gives you things such as, actually, I'm not going to tell you, I'm going to show you. >> Okay. Cool. >> I'm not going to tell you. If you're new to this, I'll show you how you can find it yourself. >> Sweet. >> Right here in the top right, what I just clicked on was the Variable explorer. That's actually going to give you a representation of all of your variables that you create within your notebook. As you can imagine, like I mentioned, the playground you're probably making x, x2, x_test is what I see here. It can get really easy to lose track of those variables or what they are, and what status they're in. What we created with the Variable explorer, you can keep track of them, it gives you a preview of the type, the size, and a quick value here. >> That's awesome. It doesn't matter the scope that these variables are in there, just all present and in one spot? >> Correct. Yeah. It's per notebook, not per cell. There's no like I need to be within scope to see it. We're going to store them all for you. >> Got you. That's pretty nifty. >> As you asked, let me to show you here the Titanic data frame. Here we go. If we want to take a deeper look at this one, for example, I can go ahead and hover over this left icon here. This will open up the data viewer. What [inaudible] actually is, it's an Excel-like representation of your data, so this works with tabular data. Essentially, what you can see here is this is pclass, this is sex, age, fare. This is a Titanic way. This is telling us [inaudible] >> It's like a manifest, right? >> Exactly. >> Cool. >> The age, how much they paid for their ticket, what port they embarked on, unfortunately, did they survive or did they not survive? >> You've got a hunt for zero dollars. Is that Leo? Because he gambles his way on? >> Probably. Leo gets everything for free. >> Yeah. >> Essentially, what this data frame is the one of the most common tasks that data scientists get started with is the ability to predict based on inputs whether somebody on the Titanic survived or unfortunately did not survive. >> Okay. That's cool. It's an interesting first task. >> But the cool thing about the data viewer is it gives you this ability to do some really quick checks. For example, to make sure that my data is clean because most data scientists know you do not get clean data, you get data, you got to clean it. You are in charge of that, you have to make sure it's right before you start working with any models. Something I might check really quick is to see, are any my ages negative? We know that shouldn't exist. As you can see, I've put in the filter here for less than zero, no values, no row entries, which means good to go. >> Sweet. So you can put in complex conditional statements in there, and I'll check just [inaudible] , right? >> Yeah. We're working with more complex stuff, but right now, we have the most simple ones for age greater than five, things like this equal to. We're also going to be adding not equal to support [inaudible] as well. Just quick checks to make sure that your data is all in order. >> That's pretty nifty. What kind of tools are data scientists using when they're not using Jupyter Notebook? Because just based off the stuff you've shown so far, it seems like it would be an extremely useful and more legible tool to be able to see all the contents that you need to know. >> They're really using Notebooks. That's the reason why they've become the de facto tool. Things like regular Python scripts while they can be great, they just don't give you that flexibility and that ability to iterate if we continuously process through and change check which is really, really common in data science. >> That's really cool. Also, looking at the main page that you were showing with all the data and the different sections, it looked like Wikipedia article almost. I like the legibility of that playground and stuff. I'm sure that's useful for a lot of other people like your team members and other fellow data scientists you want to share the information with too, right? >> Yeah. Definitely. Otherwise, things like that that you can do in the DataViewer, you would have to be writing code for, you'd have to do writing like select this column, check if it's giving the count for anything less than zero. You have to write code to check those things. The DataViewer just makes it a lot easier to give you a shortlist. So at least, if you're writing code to fix your data, clean it up, you're not wasting your time for a bunch of other columns that don't need it right now. >> Got you. That's pretty cool. I can also see this being used as an instructional tool almost. Just looking at the different segments of code snippets, can you write dialogue that explains what each snippet does and is going to return data wise, and then the student can play around with it and walk through essentially a full-blown tutorial? >> Absolutely, yeah. There's a lot of educators who are actually moving towards this tool, especially for data science if you're an Educator for data science. I can show you an example really quick as well. Even when I took my Data science certificate, a lot of teachers, they give you some notebooks, you have some markdown, and you might say something like, ''In this section, we will go over data imports and basic stats.'' Let's do that. I can run that, I can render it, and then you can see here that, as an Educator, hey, this is where we're going to be discussing this type of code or things like that. >> Nice. That seems so simple too. But that's really cool. It makes me want to get into data science. >> Quick and easy. >> Yeah, very quick and easy. Are there any limitations that Jupyter Notebooks currently has that your team is working to improve on? >> Any limitations that we're working on improving on. I wouldn't say necessarily limitations, at least off the top of my head, maybe if you give me a day, I'll give you a list. No, not at this time. >> Cool. >> Because there aren't any, I want to be clear. Just because [inaudible]. >> No worries. At first glance, it looks like there's so many powerful things that you can do with that. >> Yeah. >> With that then, what's next for Jupyter Notebooks? >> What's next for Jupyter Notebooks. I can actually talk about one feature that we just enabled on the VS Code side, and people have been requesting this for a really long time, so I would love to mention it. We have support for Multi select, which we haven't had before in our last release. Let me show you an example of what that looks like. If I scroll down here, I can select this cell. This is just one selection, you can tell by the border on that cell, but if I want to select another cell, I can hit "Control" and click this next cell. That'll select both of these for me. I can do "Control" and click another one, or even just the traditional "Shift", which I have this selected, I click "Shift" and I select four cells. Then you can [inaudible] actions with those which used to be a huge nuisance because we couldn't group cells. >> Sweet. That's some good stuff. What else you got? >> Now, so one of the features that's coming, it's on its way, actually piggybacks off of Multi select, is what we call Smart select. Let's go to the bottom of this notebook where I have a couple of models. I've decided I actually want to move forward with this model here. This is a Naive Bayes Classifier, it was the most accurate one, I want to go with this one. I can actually go ahead and click "Smart select". It's going to be surfaced within this cell toolbar here. Not quite yet, but you guys are getting a sneak preview of where it will be. If I were to click "Smart select" here for this cell, I would start to see that highlighted background on this notebook, and it's going to select all the cells that I need that are required to generate that cell and that cell's output. Once I have those actually selected, users will be able to take other actions. You can run those cells, you can export them, you could even merge them, you could turn them into a function. It's basically going to give you a shortcut to your code that you may not necessarily need all of it. Again, coming back to that whole, the notebook is a playground. By the time we're done with it, there's probably like five different models in there, we decide to go with one, this is going to help with that cleanup process for you. >> Awesome. It's the little things. >> It's the little things. It behaves similarly to a feature that we already have today that actually cleans up the line level as opposed to the cell level that's called Gather, and that's an experimental extension that we have so that's something you have to install in the VS Code marketplace. But if I were to hit "Gather", which you actually see here today on this particular cell, it's going to generate another notebook, and it's going to take just the lines of code that are necessary to create that last cell that I gathered on. >> Cool. Sweet. >> Also, it helps me clean the process. >> Awesome stuff. To wrap things up, you also mentioned I think something about Live Share, is that going to be in the near future with Jupyter Notebooks somehow? >> Yes. We already do have some preliminary support for Live Share. You can see your counterpart's notebook, you'll be able to add and run cells, and they're just working on finalizing the other goodies, but there's already some support for it, but we plan on that being a full, out of the box, supportive experience for users. So collaboration will be a lot easier in Notebooks. >> Awesome. Nice to know that data scientists aren't being left out of the collaboration experience. >> Absolutely not. >> Sweet. To wrap things up, where do people get started or how does one get started? What do I need to download or install? >> Great. Thanks for asking my most important part. This specific experience you're seeing today, you'll have to install VS Code Insiders. You can just Google VS Code Insiders, and just download the Python extension if you're working with Python. All the stuff I've gone through today is available for Python. If you're not working with Python, you can just download the Jupyter extension, but all of the goodies right now are for the Python extension. >> Good stuff. Well, I hope everyone's really excited to try those tools out. Anything else to close this out? >> No. Just give it a try and if anybody has feedback for me, they can reach out. Hopefully I can put my e-mail up somewhere and somebody can give me feedback. We're welcome to listening to everything you guys have to say and make it better for you, so please get in touch. >> Awesome. Also, I do have one more question because VS Code is has such a wide community in Open source and everything. Is there a way to make your notebooks go public so that other people can see each other's notebooks? >> Yeah. Right now, a lot of people are using GitHub for that. They're creating repos. As you can imagine, we have our Open Source repo. People can actually contribute to the project if they want as well, always give that plug if people want to contribute, but we recommend just using GitHub. There's a pretty wide community there of people sharing notebooks and goals with notebooks that they have. >> Exciting. Cool. Well, thank you so much for sharing all that really cool information about the beautiful playground that is Jupyter Notebooks, Claudia. >> Thanks. Appreciate. Thanks for having me. >> Yeah, thanks for being here. Until next time, happy coding. [MUSIC]
Info
Channel: Microsoft Visual Studio
Views: 36,449
Rating: undefined out of 5
Keywords:
Id: Ozq24uAshXo
Channel Id: undefined
Length: 15min 57sec (957 seconds)
Published: Thu Apr 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.