Jupyter Notebooks vs Python Scripts | When to Use Which?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you haven't seen me use Jupiter notebooks on the channel yet well that's changing now I'll go through a couple of examples today showing you when Jupiter notebooks are great solution but I'll also show you what can go wrong and when it makes sense to write plain old Python scripts like a caveman before we start if you want to learn more about how to design a piece of software from scratch I have a free guide for you you get it at rm.code slash design guide contains the seven steps that I take when I design new software and hopefully it helps you avoid some of the mistakes that I made in the past I've put the link in description of this video now let's take a closer look at Jupiter notebooks and what you can do with them Jupiter gets its name from its ability to run all sorts of different languages notably Julia Python and R you combine that you get Jupiter now you can run Jupiter in a web interface but vs code also has an extension that I have installed here you can also install that and I'm going to use this one in this video what is nice about Jupiter is that it allows for more exploratory way of coding I'm actually personally not used to that at all I've always written scripts and worked from that but it's actually pretty nice to work in this way especially if you're a data scientist you often need to learn things about your data and you're not maybe ready to write full scripts you don't really know yet what you're going to build and then it's like Jupiter is very powerful if you have some experience with python you probably already know about Jupiter notebooks I just like to show you a couple of things that you can do with it but then also talk about some of the problems that you might run into and to show that I'm going to start with a UFO sightings example so basically what I'm trying to discover is where I should go if I want to see a UFO because we all want to see those things right so I got some data from Gaggle Gaggle has a data set that you can use for this this the one that I used jupyter notebooks consist of cells that's basically three types A markdown input cell that's the one that you see here we have a code input cell that's this and we also have an output cell and that is what you see when you execute a piece of code it's going to see all the outputs one pro of using Jupiter notebooks versus just writing a python script is that you can just run code piece by piece so for example here let me run these Imports and then I can continue to the next cell so here I'm reading data from CSV file that's the cackle data set that I'm doing some simple processing in Ponders to clean things up to convert things and then I'm going to print out some information so when I run this then you see that we get a table with city states countries Etc there's some missing data that's not so important for now so this is really useful obviously to quickly get an idea of what kind of data do you have and what you need to do to clean up like fixing these uh not a number of issues here also because we have markdown cells here you can write down things about what you've done and what your current ideas are about the data so you keep track of what you've been doing and next thing that Jupiter is really useful for is that it can very easily plot data for you so let's run this piece of code where I've basically plotting something with the title X and Y label based on the data you can see the number of sightings plotted against time we can definitely see that things are increasing what you can also already see from this chart is that there is some wiggling happening here so maybe there's some seasonality perhaps more UFOs in the summer or something I have no idea perhaps that's tourist season for the aliens but it might be something that you want to explore further I'm not going to do that today another thing you can clearly see is that around the year 2000 we have a rapidly expanding number of UFO sightings so that could be something else that you might want to explore and that's a major advantage of something like Jupiter that you can quickly visualize things and then based on what you see take decisions on what you need to do next let's say we want to find out where we should go if we want to have the biggest chance of encountering a UFO so I'm going to create a scatter plot here to see where all the sightings have been and well this is what it looks like you can see already the outlines of the country there's not that many UFOs at Sea apparently but it's clear that we definitely need to go to the US to see a UFO now let's take a look at the countries that are most visited so let me run this and then we see a little table where we can clearly see indeed the US is the most popular country we can also do that more visually by plotting this and then this is what we get again you'd have to clean the data more to get more accurate results now the problem is that you are is pretty large so if we really want to make sure we see a UFO we shouldn't go anywhere randomly in the us but we should probably explore the US a bit further so we can add some code to the notebook to help us out with that so here we have a helper function that gives us information about the density per square degree and one degree of latitude longitude and basically one square degree is a little square well little of about 111 times 111 kilometers so let's also run this function so that Jupiter knows it exists and then what we can do is use the apply method from the data frame to apply that function that we just created on the data and store the results in the data frame so then this is what we get so this function takes quite a bit of time we have to wait until it finishes all the processing so running this function took about four and a half minutes so now let's take a look at the top 10 sites so it's pretty clear if you want to see UFO you have to go to Cerritos I'm not sure if I'm pronouncing that correctly but that's like the LA area in general that's the place to be now that's all great but notebooks can also introduce a couple of serious problems so let's take a look at another example and here the idea is I want to create a game that involves rolling dice so I've again imported a couple of things here so let's just run this also these are the things that we're going to need and then I'm going to define a number of sites because you know if you're playing serious board games like Dungeons and Dragons then I mean we're not going to use six-sided dice right that's four amateurs but let's start with this simple example though so I've defined the constant number of sides is six and now we can define a function that we can use then in the future in other cells so this is going to roll n number of dice and that's going to give the total so let me Define that function like so and then what we can do with Jupiter notebooks and that's very nice that we can really quickly visualize the results of running the code for example let's say you want to find out if you have two dice what is going to be the most common total and if you know a bit of Statistics that you know that's going to be 7 but you can actually run this simulate a rolling many times 10 000 times and then you can plot that as a histogram and indeed the most common total is seven so that's all still pretty nice but let's say you want to figure out if there's a difference between rolling 6 20-sided dice or by Rolling 20 times a six-sided dice which is also a lot of work obviously so what I'm going to do first is compute the outcomes for 20 d6s and then create a plot from that so then basically this is what we get and then what I'm going to do I'm going to change this to a 20-sided Dice and then compute the outcomes for six times a D20 eyes and then create another plot and then this is what we get so as you can see rolling 20 d6s gives this result and rolling six d20s gives us this result which looks to be slightly lower another thing you see is that the spread for Rolling 20 D6 is actually less than the spread for Rolling 6 20 sided dice and your statistics Professor is probably going to talk about Central limit theorem and things like that to explain that to you but what it does show is that you can't just simply exchange rolling 60 20 dice for Rolling 20 D6 Dice and expect the same results what's the problem here well if I go back and let's say I want to rerun this example and then create a new histogram now we see that this actually no longer produces the same result because of course we changed the constant number of sides here to 20. that's a problem with Jupiter notebooks and this kind of exploratory coding in general that you always have to be careful with what your current context is because that might change and that might lead to confusing or unexpected results the main takeaway here is that we shouldn't have used a global variable because that basically screwed up everything instead what we should have done is not use that Global variable but simply pass it as an argument to the function so here I have redefined the function to accept the number of sites by default at six so also going to run that and that also gives us another benefit that if we have to code for confusing the different outcomes we can actually clearly see what the difference is in rolling the dice so let me run this again to show you what this actually gives us but now if we go back and we run this cell again we now again get a different result which is really confusing and of course the problem here is that we redefine the function and then gave it a default argument value but the code still works on the assumption that there was going to be a global constant so the numbers are changing all over the place it's going to be very confusing so even trying to add a function later on to fix problems may lead to even more confusion so especially if you're doing this type of iterative exploratory development you have to make sure you keep proper track of the state of things and that you didn't accidentally change things in another place which affects the previous code now what you can do in a Jupiter notebook is basically restart and clear all the outputs and to do that in vs code you simply click on restart and then you also see that it's going to give you a warning that all the variables are going to be lost so I do a restart and now it's going to restart everything and everything is clean again and then we can just run all of the code and things look as they supposed to again so you may think okay so I just don't use Global variables and then all is well well actually that's not entirely true because for example also Imports are part of the global state so for example let's say I remove this random import right here and then I run this cell again well if I go here well here it's my file internet says that random doesn't exist but when I run this then actually this runs without any problem at all because the global States still contains that random Imports and that can be annoying because you may think okay okay this notebook seems to work fine so you give it to somebody else and then that person is not able to run the code because there is a missing Imports I'm curious have you encountered these types of issues with notebooks yourself have you encountered other issues that I didn't talk about let me know in the comments one thing you can do is combine Jupiter notebooks with custom Python scripts so for example here I've created a separate script called dice.pi in which I put this function row and dice and then we can even use partial function application to get partially applied version of that particular function so if you don't know about partial it's a really cool feature of python it allows you to take a function apply already some of the arguments and then return a new function and then this function now row and d6s is called exactly like this one but it already supplies the number of sides as being six I use this quite a bit in my code so if you learn more about partial check out the video at the top then what you can do in the jupyter notebook is simply from Dice import role and dice and now because I've imported my python script I can simply call that function in the rest of my notebook no thank you it is a great experience though separating bigger pieces of code from your Jupiter notebook into separate scripts actually has a lot of advantages so you can use for example Auto formatters or linters or check differences between versions of your code more easily lots of things that are nice once you move things to a full python script another thing that's nice about putting your code in a python script is that you can actually run unit test on it which you can't really do with functions that you define in a Jupiter notebook at least not in a way that I know by separating a more complex logic from your Jupiter notebooks your notebooks also become smaller easy to manage so there's lots of advantages in doing it that way so in summary Jupiter notebooks are great for more exploratory programming if you need to look through your data quickly see what's going on it's really great for that and since you can run these cells in arbitrary order it's very flexible but but on the other hand it can also lead to a lot of confusion if you're not being careful and that especially has something to do with the global state so if you notice that this is starting to happen maybe it's an idea to move some of your code to separate Python scripts where you have more control over the complexity and then you can import that in your jupyter notebook to keep things simple and clean and that way you can have your cake and eat it too or throw it into someone's face or do something else with that cake I actually don't even want to know what you're going to do with that cake anyway one spot of your code is in separate python modules then you can use all the tools that are available to us like linters Auto formatters and type Checkers and by the way I'm a really big fan of using type annotations and if you want to know why you should watch this video next where I give you 5 reasons why I think you should use them all the time all over the place thanks for watching and take care
Info
Channel: ArjanCodes
Views: 33,546
Rating: undefined out of 5
Keywords: python scripts, jupyter notebook, jupyter notebook tutorial, jupyter notebook tutorial vscode, jupyter notebook python, jupyter notebook vs python script, python script, data visualization, data visualization python, data engineering, data analytics, data science, python tutorial, jupyter notebook mac, jupyter notebook online, jupyter notebook install mac, jupyter notebook vscode, data scientist, data science for beginners, visual studio code, data science programming
Id: 0Jw8seqai18
Channel Id: undefined
Length: 13min 7sec (787 seconds)
Published: Fri Aug 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.