How to use Jupyter notebooks for data analysis (2024 tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if you're looking to get into the world of data science in Python there are a few things you absolutely have to know and one of them is jupyter notebooks Jupiter notebooks allow you ultimate control over what you do you have an interactive series of cells that you can edit and modify as you go along you can perform non-destructive editing using these cells as well using the advanced preview features on top of all that it actually provides really nice error messages for you to quickly debug your code in this video I'm going to assume you've never seen a jupit to notebook in your life and walk you through to create your first notebook how to create markdown and code cells teach you how to use all the different parts of the interface as well as cool things like magic commands and graph rendering of course if you find this video helpful at any point then consider like it to let me know and maybe subscribe if you want to see more videos like this if you're feeling particularly generous you could support me further by becoming a member or a patron all the information you need is in the description below with all that out of the way let's learn how to use Jupiter notebooks so to get started with Jupiter what you need is a notebook interface and you could do so on Jupiter's website you come to jupiter. org and then they have their own interactive uh notebook environments what I'm going to do instead is use Visual Studio code instead it has built in um Jupiter kernel support I imagine a lot of other IDs do as well I'd be very very surprised if pie charm didn't for example uh so you can use your preferred ID if you want but I'm going to be using visual studio code the differences between all of them are extremely minimal if there are really any at all it all works the same it's all the same standard uh so it's all fine so what we're going to do is we're going to go over here and create a file I'm just going to call it notebook. iy NB and that is the extension so IPython notebook Jupiter runs on top of something called I python which itself runs on top of cpython it's not an implementation of sorts it's simply just an interactive python shell that sits on top of it uh and it provides some nice little features as well that are you know very useful for something like this so we get given a Cell by default but before we even do that we need to install our kernel so if you're on Visual Studio well if you're on the Jupiter online editor you don't need to do this if you're Visual Studio code you can come into the terminal and I've already got a virtual environment and everything set up you do pip install II kernel like that and it's kernel with an e at the end not an a I kept getting confused with that for a long time and it install everything it needs including IPython as you can see including all sorts of other stuff that it wants as well and then we can go up to the corner here yours might say select interpreter or select kernel and and come in here and then yours might have like environments or something I'm getting one called select oh yeah this is the menu you get so you get python environments or existing Jupiter server you want python environments and then you want to set your python version so I'm going to set it to CBR Jupiter which is the which is the virtu environment I set up for for this video and that will then allow you to you know containerize all your things again on Jupiter notebook you'll be given a python version I'm not really sure if you can select your python version straight up that's one of the nice things about an ID is you can use whatever installed on your system so the first thing we're going to do is show the markdown so I'm actually going to get rid of this cell and I'm going to create a new markdown cell here because in notebooks your cells can either be code or they can be text this is particularly useful for Education settings or research settings or even just you know something to put notes in so this supports full markdown so I can have a title saying this is the title of my notebook if we save that and then if we run it here uh it will connect to the colel it should then render it nope it's this tick there we go I'm getting ahead of myself uh so we can see that this um title is rendered I believe it does like tables and all sorts I think it's like fully marked down compatible so you can put whatever you want and you you can have as many code and markdown cells as you like so that's pretty nice if you ever want to edit it you can come up here to this pencil button for edit cell and you could just edit what's in here so I could have like a little subheader said and this is a subhe header and then you can click this tick and it will save it there all good and we can create a code cell by clicking this button here and you may need to install some things before you know you want to do stuff so if you want to use pandas you might want to install that you might want to install Seaborn um which sits on top of map plot lib and you could do that in the terminal if you want or if you want to make sure that other people can run this uh notebook as you'd want to you can use something called Magic commands so magic commands start with a percentage sign and then the one we're going to do is PIP install then I'm going to install pandas I'm going to install P Arrow to prevent a deprecation warning from coming up and then I'm going to install Seaborn as well uh if you don't know what Seaborn is I made a video about it a long time ago but it's basically just m pot lip but nicer really and then to either run the cell we have a few options so we have this um execute above celles which will execute everything above we have this one here which executes this cell and everything below this one which splits the cell which I don't find very useful we have delete the cell and we have a few other options here and on the side we can just run the cell like this or as you can see we have a little keyboard shortcut I'm point to the screen so you can see that so you can on Windows it be control enter on Mac it would be command enter on Linux it will probably be control enter or it would be what it be I forget if it's super enter or meta enter I forget the terminology but I'm just going to run it like that and now that will install all of these libraries into uh our virtual environment that we've selected so it'll install it into the environment whichever has the kernel in it and once that's all done be able to show you in the terminal let's take a little while to do there we go it's just finished do PIP show pandas and it's here so we already have it installed so that's how you can install um packages within the notebook which is particularly useful okay another code cell we can do another magic command another one that's particularly useful is matap plot lib in line this is more just a standard thing it just makes sure that the um that the graphs and stuff that you use in mp lib are rendered properly uh there are other options to this as well but a lot of them don't seem to work with Visual Studio code for whatever reason I'm not sure if that's a problem with this uh or just a problem with Jupiter I believe it's a problem with vs code so that's something to keep in mind when you're choosing which Editor to use and then we're going to run that as well and then while that's running we are going to import pandas as PD and then oops and then import Port caborn as SNS and again we can run that if we ever wanted to run you know everything at once we can come up here and then we can click this button and then everything will run at once I'll show that off a bit more a little bit later what we're talking about running these cells though you will notice that these numbers are appearing next to the cell not sure how easy that's going to be to see I might zoom in we have this number two in square brackets here and you have this number three in square brackets here if I run this again we can see this goes to number four and has the execution order so we could see that this ran first and then we ran this one afterwards so that's just useful to keep in mind you also have the time it took to run uh oh you also have some more detailed stats there I didn't know you could do that so you have the time it took uh to run the cell which is always quite nice to have actually so if we just load a data set in I'm going to use one of seaborn's kind of pre-installed ones this is why I wanted to use caborn so we do SNS load data set and then we're going to load the iris data set and then we are going to do DF do head which prints the first five rows and then we're going to run that and we'll see that the result is actually printed for us not only is it printed for us but it printed in a nice fashion that's because the last line of any um cell within the jupyter notebook is output um out and we can change the output mode so if we come to these uh three dots here and we click change presentation we can have it in plain text if we want and this is now the raw Panda's output what I would recommend is having uh the builtin note well yeah having text HTML which then outputs things a little bit nicer if you want to suppress the output uh you can use a semicolon and it will no longer output what was in the cell I don't find this particularly useful but you know sometimes it could be useful but we'll leave that out for now so we can see what we're doing but this also opens the door to do a few few nice things so because pandas doesn't do anything in place unless you tell it to what you can do is you can use this output to to preview some changes so if we create another code cell and then if we wanted to do DF do drop let's say we wanted to drop the species column and if we need to provide axis equals one you could use this to actually you know preview changes so DF do drop returns a data frame and so the data frame is printed uh out to us you can see the species is gone but if I to create another code and do DF do head because pandas hasn't done anything in place it's still there so we've done some non-destructive editing and we've then previewed what was going to happen so we could do petal width um in here instead and we can see that species comes back if you want to save these changes we can do DF equals and then we can preview them with DF doad and we'll see that the um The Petal width column is now totally gone if we create another code cell and I uh demonstrate some graphing so caborn is quite nice like this because you could do just SNS doline plot and we provide DF and then we'll have x equals say C pole length and then yal CLE width if you run that cell then we get a figure that isn't going to display today why why is that happening there we go so you can change the presentation mode to IM it must have selected the wrong mode okay if that happens just go to change presentation and then I guess yeah select the jupyter notebook one that's the one that should have been the default I'm not sure why it wasn't maybe I changed it at some point and didn't change it back but the Jupiter notebook renderer should be the default uh so if that ever happens to you that's how you go and change it but that will then display your graph here and if you want to open it in more detail you can click uh that button there and you can start moving stuff around zooming stuff in and all that jazz the only thing that vs code does not support is interactive plots using mpot lib so if you were to select the uh The Notebook mode here if I were to do this and then run everything again in a Jupiter I believe in the online notebook that will have an interactive plot and I believe if you do it in a vs code script it'll do an interactive plot as well on this for some reason it won't and I'm I'm not really sure why so that's something to keep in mind and as you can see we're running the cell and below and it goes and runs everything in order and then we get the graph back before we finish up here I just want to go over the rest of this top row for you so this run all will run every single cell in or yeah every single python cell in uh The Notebook so it'll run all this pip and store stuff and then it will run everything down here as you can see it's going it's going and it's already finished cuz all these operations are quite quick uh sometimes when it says uh not you may need to restart the con to use updated packages I find this isn't always the case but I do find it is sometimes the case you could just do that by clicking this restart button and you'll see it has restarting down there and now when you run say this last one you'll give you an error saying SNS is not defined because of restarted the Kel so all the memory has gone it's not super clear about that because these numbers don't disappear but there you go the same thing happens when you click clear or outputs so this all you clear all the outputs it will clear the numbers this time but it'll also restart the kernel um so now if you were to do this again you know SNS is not defined but all our libraries are still installed because they were installed to the virtual environment and not specifically to the kernel you also have these variables up here which if I run everything that might have a slightly more useful output yeah so you can see just your variables uh I don't really use this too much but this is actually quite a useful view so you can actually uh see the data frame in one view oh that is quite nice okay I might have to use that more uh and then you have this outline as well which just brings another tab up here and this is more for your markdown stuff so if you had you know multiple subheadings or like an an actual document in here then you can you know jump to specific Parts using this and then you have this uh three dots here so you have export uh customized notebook layout which just takes you to some settings I guess and then you have your breadcrumbs which are I'm not sure what those are actually oh it's just this top bar up here okay so that's all that controls all right fine you have toggle notebook line numbers as well um so you can put line numbers within your C if you choose to if you have any questions about what you've seen here or any ideas or videos you want me to do in the future make sure to leave a comment down below I read every single one so you feedback is greatly appreciated I also want to thank my amazing patrons and members on screen now especially mazard rashman I thir for being so generous and I will see you in the next video where we talk about creating virtual environments within python 3.12 so I'll show a few ways so what I was talking about before with this virtual environment that I had set up I'll show you how I actually set that up so I'll see you for that
Info
Channel: Carberra
Views: 1,253
Rating: undefined out of 5
Keywords: pyfhon, pytho, pytbon, pytjon, ptyhon, pytyon, ptthon, pyyhon, pythn, pythoh, pythpn, ython, pytgon, pyhon, pytohn, phthon, oython, pthon, pyghon, pythoj, pythno, pythkn, ypthon, pytuon, lython, pyrhon, pythom, pythob, puthon, pgthon, python, pyhton, pythln, pythin, pytnon, pyton
Id: bqw5-8f-cEI
Channel Id: undefined
Length: 14min 22sec (862 seconds)
Published: Mon Feb 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.