Jupyter Notebooks in VS Code Extension - Tutorial Introducing Kernels, Markdown, & Cells

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
when you're analyzing data it can often feel like you're investigating an unsolved mystery you've got some problem or a data set and some hunches about what's going on so you form some hypotheses and start to develop a narrative that allows you to tell a story and check out step by step whether or not there's evidence to support your hypotheses or refute them today we're going to look at a popular tool jupiter notebooks used throughout industry and academia by data scientists to enable this sort of narrative driven development where we're going to do uh write some prose that describes what type of investigation we're going to do interleaved and followed by code that actually does the analysis and looks for evidence in the data followed by more narrative that continues to build on that and more code and so on such that we have what feels a lot like a scientific notebook where we're able to keep track of both the reasoning that we're going through while we're working through a problem as well as the code which does the analysis we're going to look specifically at how to use jupyter notebooks inside of vs code which has a brand new and awesome developer experience for working with this technology so in order to use jupyter notebooks and i should mention that the name jupiter is spelled with a py because it was initially made for working with python style programs but the jupiter universe actually has support for other programming languages in this video we're going to be looking at using python in jupyter notebooks to get it going in vs code you need to have the python extension installed in vs code which will come bundled with jupyter and some other extensions that make this all possible if you've been working with python you probably already have those installed so in the lessons directory i'm going to set up a new jupyter notebook and i'll just name this notebook.i pi nb so notice right off the bat that i pi and b is the extension we use for a jupyter notebook and that contrasts with just the dot pi extension we would use for a traditional python file you'll also notice that this editor has many different things going on we've got some buttons here at the top we've got one text cell that looks a little bit different than we're used to being able to edit in when we open a python file and up here on the right you'll see that i have python397 you may see a button that says something about selecting a kernel or no kernel selected if you click this you'll get a list of the python versions you have installed and i would encourage you to select the most recent version maybe it's three nine maybe it's 310 you may be prompted by bs code to install some extra packages such as jupyter and some others that are commonly used in python notebooks i would accept all of those and that should get you going once we've selected a kernel the buttons at the top might change such that you see this restart button and we'll come back and talk about each of these buttons in due time but let's go ahead and start thinking about this idea that we can do narrative or story driven development and have documentation leading the way and code interspersed with it so i'm actually going to use so over this cell you'll notice there's a sort of pop-up and i have the ability to click the delete cell button so i'm going to go ahead and delete one cell the the one that initially set up for us and you'll notice that we have the option to choose from either adding a code cell or a markdown cell markdown is a spec a special kind of formatting we can use to write text that will cause it to be formatted in a pretty nice good-looking way once we what we call render it or or have the text be interpreted with formatting but when we're editing markdown we can do things like use the single hashtag to say this is a heading and say um demo jupiter notebook and on a couple of new lines i would encourage you to follow along and add some documentation that says this notebook will exemplify some of the common use cases in python jupiter notebooks you'll notice that to the side here we have this markdown is what it's telling us this type of cell is so jupyter notebooks are made up of individual cells that are ordered and come one after the other and if i click the check button here you'll notice that it says we can stop editing the cell and as soon as i do that you'll see that we have this nicely formatted text and i'm going to add if you double click this cell so i click the check and it will render it as formatted text if you double click it we can go back to editing it so if you were to edit it with something like two asterisks around the word sum you'll notice that in the editor we get slightly different formatting this will bold that text and if you go and render it by clicking that checkbox you see that oh yeah there's this has been bolded so markdown has some notations that we can use such as this hashtag and such as these asterisks that will allow us to format the text that we're writing in our programs in a common way markdown is a technology that's common in many different scenarios and environments around developer tools as well as on the internet and you'll find it in many different places around the web it's a really nice way to work in plain text but still get nicely formatted text outside of it okay so we've seen a markdown cell and we can know how to render it how do we add a code cell well if you click the plus code button if you hover over your cell towards the bottom of it you'll notice that we get this popup down here that's one way of doing it and we could delete this code cell if we wanted you can also press the plus code button over here and that will add a cell that allows you to type into it and what we'll see is we can enter some code here so let's say we declare a variable such as name which is type string and say the name is my name right and we can print the name okay so as we're editing this notebook you'll notice that we're able to enter some python code into a code cell we can again see over here on the right this is telling us hey you're working in python in the cell and you'll notice this execute cell button that allows us to run the cell interactively and so notice that chris was printed out and maybe we make this something a little bit more interesting like an f string that says hello followed by name right and i try running the cell again and we see okay yeah we can run statements and uh and little snippets of code in these text cells we can then go back to text and add some markdown and let's add some notes about formatting markdown so i'm going to add a subheader here and say that we can do things like you can have lists you can have links and so if i were to make an example of a link that's something a word that's surrounded in two square brackets and then maybe why don't we go and find a good reference for markdown syntax so there's nothing too special about markdown again it's a very common formatting language so if i were to search here for markdown cheat sheet what we would see is you know you can pick any link and i'll choose the first result and you'll notice that here we've got some syntax telling us how to find headings we can see the bolded text some of these ordered and unordered lists and then there's some fancier things that you can do in extended markdown so i might just copy this url markdownguide.org cheat sheet and in these parentheses following oops uh the the link you can have links such as to a markdown cheat sheet and when i render this so when i click this check box notice that that's become a link that's clickable and it brings us back to where we were all right so in this video learning a little bit more about the actual syntax of markdown is beyond our scope we're not really worried about that the point is as you're developing and you're doing some data analysis as you're developing your notebook it's really nice to be able to mix in prose that's formatted in ways that are nicer and easier to read than you have in like traditional code comments interspersed with the code that's doing the analysis all right so let's actually talk a little bit more about what's going on in in this example and let's let's maybe add one more code cell below this formatted markdown and in this code cell let's say that maybe we print one more message that is another reference made in this cell and i evaluate that cell and we see another reference to chris is made in that cell so notice we're able to reuse a variable that we had declared and set up and run in a previous cell later on now the model for how jupyter notebooks works feels very straightforward when you're working with it in this way but as you get into some other scenarios can be a little bit confusing the good news is vs code and jupyter notebooks have ways of helping you understand what's going on and one of the most important ones that i like to show is this variables button here so if you notice up in our top there's this button variables and if i click that you'll notice that we see so far in this running jupyter notebook i have a variable named name it's a type string and its length is four its value is chris right if we were to go back to this cell and change this so hello khaki and i execute the cell so i press that play button again notice that it says hello khaki and this variable value has updated down here in our listing of variables but something that feels kind of curious is okay we just changed this name variable in this code cell but notice that the output down here still says another reference to chris is made in this cell so it's not until we execute this cell that we get that to update to another reference to khaki as made here so hopefully you're already starting to get a sense of something that's really important to understand with jupiter notebooks you get to run the cells in any order you want and when you run a cell what you're doing is you're impacting this jupiter variable state which winds up just being a global state effectively so let's think about what's going on there a little bit more specifically and what happens when we restart a kernel so you'll notice that there's this fancy word kernel and that just means it's the running program that's sitting behind your jupiter notebook such that whenever you run one of the and execute one of your code cells it's executed in terms of that running kernel so if we restart the kernel so try clicking this restart button you'll notice that there's no variables defined and that feels kind of confusing because one of the features of jupyter notebooks is it saves the last bit of output from any code cell that you've written so it kind of looks like this program is run it kind of looks like there should be a variable named name established but when you start up a kernel fresh like we just did by restarting it none of your code cells have evaluated yet so i'm going to show you two ways of going about what to do when you open a jupiter notebook you were working on previously for the very first time or you find yourself in a position where you need to restart a jupiter notebook which can be common if for some reason you get into a weird place or you just want to test your notebook to make sure that it's still working in the way that you expect if you were to start over from scratch so if this cell hasn't run and we've just restarted our kernel like i just did there's no variables defined down here notice this really helpful button that says run all so if i press this it's going to go from the top of my jupyter notebook all the way down to the bottom and re-run every cell along the way notice that that means that this variable name became established when this first code cell ran and we're in good shape all right if i restart my kernel again i want to point out that you don't have to run these cells in the exact order but you should be very careful with this because if i were to go try and run this cell right notice that we get an error name has not yet been defined that's because the cell where we defined name it hasn't run yet so our kernel in its global state which we see everything defined in the kernel's global state in this jupiter variables box hasn't been evaluated yet so if we were to rerun this cell first and then go run the cell that came after it that made a reference to that variable you can see that everything works and all is good this is different than how you're used to working with stored programs in a python file where you save your program and you run the entire thing through once and it feels a little bit funky right until you get used to this idea that okay when we start a kernel or we restart a kernel we've effectively got blank slate no code has been evaluated in the context of this running jupiter jupiter and python program yet so once we execute a cell those two lines of code will be interpreted and evaluated and we can run these cells in any order you want typically it's best practice to write your notebook such that you can run all of your cells in linear order and have everything work out just fine like we have here so i can run them all and we see that we've outputted everything as we would expect one other thing you need to be careful of that's different from a stored program that feels actually a little bit more like when you're working in the terminal and a read evaluate print loop is or a repl for sure is that notice that this circle is telling us that we haven't actually saved the work in this notebook we're running these cells in this running python program interactively but our work hasn't been saved yet and you have to be careful of this because if you want to submit work or send a file to a colleague from a notebook that you're working on you might not have the changes that you think you have saved in your notebook actually there until you save your file explicitly so in vs code you can go to file save or just press control s on windows command s on mac in order to save that file so in vs code when we see that dot has gone away that means that we have actually saved all our work one other thing that can be a best practice to do to be sure that your notebook is working is you can clear the outputs of all your code cells so notice that now the outputs you know below these print statements are no longer there and then try running all again and being sure that you don't have any errors going through right in terms of things to be careful of there's one other very important error that is common when you're working with a jupiter notebook well if you're working and you're not working your way top to bottom and you go up and you change something early in your program like let's say we change this variable name to user and we change this to say hello user right and let's say we change this string value that we're going to assign to user to be say mark all right so when i evaluate the cell notice even though we had taken away the previous declaration and initialization of the variable name that doesn't matter because our kernel is still running and it we had previously seen executed a cell that had name declared in it and defined but now we've added a variable user and notice that the value for that user is mark while name is still khaki so if we came down here and evaluated this we see that there's this hint that hey that you may not have defined name but that's actually vs code trying to be helpful we can still evaluate this and this will still work because name is still a variable that is in our global space because we previously put it there that's assigned to khaki so it looks like this is still okay but it's not actually what we would expect we kind of expected this to be mark but we forgot to update this variable name later in our notebook so how can you protect against that well one of the best things you can do is after you've made some changes to earlier code in a notebook try restarting your kernel notice once again we've now got our variables cleared we run this again we see that user is declared but there's no name because we removed that declaration and now we have an error in this cell because the name variable is no longer exists in this running kernel remember when you restart a kernel it's saying like throw all of the variables that were previously set up globally away such that all you have left are that you're working with a clean slate and still you until you start to evaluate more code cells so another reference to users made in this cell we can update that variable there and we see that that's now working if i were to now restart this kernel one more time and run all my cells from top to bottom we see that okay we're in a good place as you get comfortable working in these cells and working in this mode you'll often find that it's kind of a pain to have to take your fingers off the keyboard move your hand to a mouse or something and press this execute button so there are a few shortcuts you should know of and so let's go ahead and add a markdown column here useful shortcuts keyboard shortcuts right so the first is uh execute a cell and leave the focus or cursor in that cell right and what i'm going to do is i'm going to surround i'm actually let's put the shortcut here at the very start of this and i'm going to surround it in back ticks so back ticks are a special markdown formatting that will give us code or it'll look like code once we render this what we call a monospace font and so control plus enter or return on a mac is what will cause a cell to be evaluated so if i were to press that now control enter notice that in a markdown cell that just causes it to be formatted in pretty text not using markdown again we can double click to go back into editing it control plus enter will render it if we're in a code cell notice i can make some changes here and i can press control enter and that's what will cause that cell to re-evaluate and execute the code that's contained in that cell in the context of the running kernel behind the scenes great so that's one of the most common shortcuts that you'll use you can just edit your code control enter boom another one is shift plus enter will execute a cell and move to the next cell or create a new cell at the end right and so i'm going to press ctrl enter actually let's press shift enter here shift enter and boom notice that we've moved down to another cell and because we were previously editing the very last cell in our notebook this went ahead and created a new cell for us it made it a markdown cell so in vs code it's continuing on with whatever type of cell you were previously in it's making another one of the same type but we can actually convert back and forth using these ellipses here so if you'll notice these ellipses and we can change cell to code or change cell to markdown and that will allow us to toggle back and forth so again if i'm in a markdown code or if i'm in a python cell and i say change cell to markdown that will allow us to enter some text if i'm in a markdown cell and i want to actually write code there we can use the ellipses to change it back to markdown and here we could do add some more code so a common thing that we might do is some analysis where maybe we're doing some arithmetic and this will be a very simple trivial demonstration i don't have a good example right off the bat but let's say total is say a float that is the result of you know maybe we had some previously computed some sum to be 110 plus 1000 right one of the common things that you'll see in jupiter notebooks as you become more familiar with them and you see what other people do is you're going to want to initially because you're used to needing to print your values in your program in order to see them as output you'll have this instinct to write print and then total and then maybe maybe you'll add one more print statement that says you know some important computation bleep bloop and i used ctrl enter to evaluate my cell oops i didn't mean to set up a new frame there okay so ctrl enter evaluates the cell and notice that these two lines are printing out well something that's very common in vs in in jupyter notebooks is that if you just have an expression any expression as the very last line of a cell notice that i ran and evaluated the cell one more time and we still see that expression being printed out so i could say total plus you know 200 000 to make it obvious that we were still ch that this is actually an expression and notice that sure enough we're still seeing that evaluation so in some ways this is like a repel where you can enter an expression and you'll automatically have it printed out if it's the very last expression in your cell this only applies to the last line of a cell and if it's an expression the jupyter notebook will try and render it as output for that cell when you evaluate it this is pretty common and handy so you don't have to always print something you can just end your cells with an expression so maybe let's make a note of that back up in the useful keyboard shortcuts we might also add actually let's add one more markdown cell down below the the the cell where we computed that uh or made that simple little computation and add another note that is um uh expressions written as the last line of a cell are automatically output this is very handy for not needing to call the print function at the end of every cell so typically when you're doing some analysis the result of some code block is going to be some value and you just use that value as the very last expression in that cell and you'll see its output when you evaluate that cell just beneath it all right so that's very common so we can again restart our kernel clear all the output and try running all and we see that okay everything is still in good shape i notice that you know when you clear output any cell that hasn't been evaluated you see no output below it hopefully that's pretty straightforward so these are some of the most fundamental operations you have when working in a jupiter notebook and some of the handiest keyboard shortcuts that will save you a lot of time once you get into the habit of using them there's one other thing i want to mention which is how can we actually import functions that we write in other files python files into our notebooks in order to demonstrate this i'm once again going to save and notice that we made a lot of changes and ran a lot of cells without needing to save our notebooks so always be in a habit of saving progress in your notebook regularly and i'm going to in the same directory or the same package as my notebook set up a file that is say notebook underscorehelpers.pi okay and let's just define a very silly function here example of upper functions or a notebook there's nothing fancy or special about my choosing the name notebook underscore helpers i just wanted to pick a module name uh that that was valid and let's just define a very simple function such as add to ins right so very very simple silly function x is an end y is an end and we're going to return an end and add x to y and return x plus y right so the point of this function is not to actually implement something useful but just to give us and actually let's use python naming conventions here so i should name this add underscore to underscore events sorry i should use snake casing there and so we've got our function now the question is how do we import this function into our notebook so that we can develop some helper functions and we don't have to have all of our functions defined inside of our notebook that will allow us to cleanly separate some of our algorithms from our analysis and reuse these algorithms or the functions that we use in multiple different notebooks well something that's kind of funny about the way notebooks are run is that they tend to be run the kernel is evaluating inside of the same package as the module so how you're used to importing say one file into another such that we would say you know from lessons import notebook helpers here we're going to actually import this in a slightly different way and let's go add an example of that so i'm going to at the very end of my notebook add a code actually let's add one more markdown cell with a subheading of examples example of importing a module all right so we can import the module directly so i could import notebook helpers right and i noticed that i had edited this as a code cell and i evaluated it with ctrl enter and we don't get any output right but if we wanted to we could do something like notebook helpers dot add and then what did we name this two inch and then let's say one and two and we expect three to be printed out and sure enough your mind is blown we've we're printing three out we can convince ourselves this works change one of our arguments here re-evaluate the cell in 21. all right so that's pretty exciting that we are able to import functions from other places and make use of the functions defined there similar to what we've seen before you can also if we wanted to import just the add to ins the function directly you could say from notebook helpers import add to ins and we can call that function directly add to hints and 10 and 20 to get 30 and boom there you go some really handy features that are built into the vs code python or jupyter notebook experience are that we can toggle our subheadings so notice we can kind of collapse some of this analysis that we've done such that we can just see the markdown and you can do this all the way at the top level and then when we have you know here's the formatting markdown notes if there was code specifically related to doing the analysis of that section you can hide it or unhide it using these the chevron that turns to the left or faces downwards all right so this is a toggle that allows you to hide relative to the headings of of your document and we can also see if i were to you know view the outline of this notice that what we used is headings demo jupiter notebook formatting marked on nodes useful keyboard shortcuts the headings of our document we can find in this outline and we can click around to quickly jump through a notebook because as you're doing data analysis some of the your notebooks might become rather long if you're if you've got a significant narrative around the analysis you're doing uh to tell the story of your data and and do the investigation and so it's really handy to be able to jump around and jump to specific points in your analysis and do that kind of documentation again some of the key fundamental ideas that i want to just press home one more time are that when you restart a kernel you're effectively re setting up a blank slate and until you evaluate a code cell no code has been no code from your notebook has actually run in this and that feels a little bit misleading when you see some output here so one of the first things i would encourage you to get in the habit of doing is after you restart a jupyter notebook or when you open one up fresh for the first time so maybe let's try that so i'm going to close my notebook and open it up for the first time oops that was my helpers file so i open this up for the first time and if we go and look at my variables you're going to notice that there's no variables defined because in in python in vs code when you close the tab that has a jupyter notebook running it automatically tries to stop that kernel on your behalf so when you open it back up you've got a clean slate no variables to find and that's kind of confusing because it looks like there's output here remember the notebook is saving the previous evaluation of a cell's output and giving it to you here so when you open up a notebook for the first time i would encourage you to clear the output of that notebook straight away as the first thing you do and then go run all and be sure you're still in a good place by using this run all button you ensure that you don't accidentally try working on your program later where you left off without having reevaluated your cells because you can get into a bad way and let me just demonstrate that so if i don't save the or if i close that and i open up my notebook one more time okay and i scroll down and let's say this is where i left off this is where i wanted to keep working and keep working on my program in my jupyter notebook here when i run this cell ah interesting vs code actually did something very clever here and it went and re-ran the cell above when we went back in for the first time that's actually a awesome feature that i wasn't aware that they added to this version i i'm actually i want to convince myself that that actually did what i think it did okay so notice there's keep opening the helpers file notice there's no output okay and i scroll down to this point and i press play and that's just execute cell and yeah it actually did something super clever that's going to save you some accidentally forgetting to rerun all your cells although i still think that's good practice and it went ahead and ran those cells on your behalf my understanding in the last time i used the default jupyter notebooks it didn't do that for you and you could run into certain errors in any case i do think there's value in clearing all your output and going ahead and running all and just scrolling through and being sure your program doesn't have any errors in it and then continuing to work on your program if you're ever confused at what is actually in memory in your kernel because you've changed variable names and things like that using this variables tool will help you get a sense of what's actually in memory so that's the introduction to using jupyter notebooks in vs code they're a very powerful tool for doing story driven development where you start off with what is the analysis you're hoping to do explain it in english then load your data then set up what kind of hypothesis are you testing explain that in english with some nice formatting and then write some code to go do that analysis and so on you'll find this is very handy when you're analyzing data when you're trying to search for certain features in your data sets and when you're trying to make decisions using data science tools and frameworks and you'll see it all over the place in the data science world good luck with your jupyter notebooks moving forward
Info
Channel: Kris Jordan
Views: 24,114
Rating: undefined out of 5
Keywords:
Id: HJgX1WWC26A
Channel Id: undefined
Length: 33min 9sec (1989 seconds)
Published: Tue Oct 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.