Setting Up Your Python Data Science Environment

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello there project data scientists and welcome to another marvelous project data science tutorial so if you've been with us before you know that here at project data science we're all about learning through doing which is why all of our tutorials are hands-on so i expect you if you can to be at your computer following along and setting up this environment as we go through the video this tutorial here is perfect for beginners but it's also perfect for you if you've just never really found a good data science groove or if you're missing some of these good key pieces like version control like virtual environments like github or if you're not satisfied with your code editor then this is also going to be good for you so let's talk about what we're going to be doing in this tutorial so first we're going to set up python so there are several different ways that you can install python and get it going but i'm going to show you how i like to do it which also ties into the virtual environments we're going to be using we're going to talk pretty briefly about the terminal so your terminal is where you can launch python from you can run python files you can open up your project directory inside of your code editor you can do a bunch of stuff it's very powerful and we'll talk for a minute about that and the terminal that i like to use we'll talk about the code editor so in particular vs code is we're going to be talking about i will walk you through some of my favorite aspects of the editor and how i like to use it we'll talk about virtual environments specifically using conda and a lot of people don't use virtual environments or they're confused about how to go about using virtual environments but this is definitely a best practice that i highly highly encourage you to do and we'll talk about version control using git and github so essentially how do you track changes to your code over time and make sure that you know even if you throw your laptop out of a window you've still got all of your code stored somewhere safe and finally we'll talk about jupiter notebooks where a lot of data science work has moved these days jupiter notebooks are pretty powerful and we'll talk briefly about how to get those set up oh and one more thing we will talk about how i like to organize my code and how i like to organize my projects which is i think a very good organizational structure that that you can also use to make sure that your code is nice and organized so why did i create this video well this this slide i think shows a little reason why and nothing against this slide i think that this definitely is useful in certain ways and you know has a place but this is pretty overwhelming right and if you're getting into data science for the first time or if you've only been in data science for a little bit this might be kind of what it feels like when you think about how do i set up my coding environment like should i use python or another language what what code editor should i use out of the you know hundreds that are out there not actually hundreds but um what uh you know what kind of virtual environment should i use should i use one of these other online platforms like how do i do all this stuff and and so this can be incredibly overwhelming which is why i think that wrapping it into kind of this nice package here just a few things that really get you up and running to a professional level pretty quickly can save you a ton of time and this is the environment that i use on a daily basis i mean this is what professionals use you know so it's not like this is going to be too simplistic this is going to set you off with a really great foundation on top of which you can get into the actual data science faster so that's why i created this that's why i think you're going to learn a ton from it and find it really valuable let's go ahead and get started all right so first up is the terminal and we're not going to spend a ton of time on this because this can get a little tricky between different operating systems and the the default terminal that comes on your computer is probably good enough for you to go ahead and get started with but i will go ahead and show you what i like to use and if you're on a mac or a linux then this is going to be my recommended path here and if you're not then hang on just a second if you're on a windows i'll talk about that in just a second so if you're on a mac or linux zish is a really great shell and oh my zish here is a nice way to install it so this just comes with a little bit more power than your normal shell might like bash um i would recommend giving this a try but if you want to stick with the default terminal on your computer so if you go down here and you see terminal here and this pops up so here you go you see actually oh my zish is installed it's the default terminal and mine wants to update right now so i'm going to go ahead and let it update but your default terminal might be running bash and that is probably fine as well so you see here my terminal um gives me just a little bit of nice color schemes and i've got my conda environment over here on the left and if i'm gonna get repository for version control it shows that as well and there are some other handy dandy little tricks that you can use with zish so i would recommend installing that now if you are on a windows i want to talk about that really quickly so you're probably fine sticking with the windows command prompt or powershell so cmd is going to be your basic command prompt your basic terminal on a windows computer and this actually works a little bit better out of the box with the virtual environment that i'm going to show you later but you can also use you can also use powershell and this is going to be a i and i admittedly don't know a ton about the difference between these two but i believe powershell comes with more advanced configurable programming and scripting and things like that compared with the normal command prompt but like i said this doesn't work quite as well with a couple of features about the virtual environment here so both of these should already be installed on your computer i believe if you have a windows so i think that these should be a good starting place for you try starting out with the the command prompt see if that works i'm also i'm going to mention two other things really quickly for windows users one of them is very new as of the time of recording this video that's the windows terminal or the microsoft terminal so i believe that this is supposed to bring together the kind of the command prompt powershell and also linux slash unix style uh terminal functionality all within kind of one terminal here so this is pretty new and i haven't used it but this could be something worth checking out because honestly a lot of developers use the linux style terminal which is bash or zish and it's going to be kind of hard to find [Music] programming support for windows command line and powershell some of the time this is just due to you know maybe the fact that a lot of the world's servers and websites and everything are run on linux machines and a lot of developers use linux and mac and macs are built on top of kind of the linux shell um they have you know bash and they can install z and everything so it's just gonna be a little bit harder to find support for microsoft stuff which is why something like the the windows terminal might help and that leads me to the last thing which i'm going to recommend here which is the the linux sub system for windows this is a way to essentially install linux on your windows machine and then have access to the normal bash and zish style uh terminal so if you're on windows i'm sorry that i don't have it quite as clear of a recommendation here but you know start with the command prompt and if you want to dig a little bit deeper i would recommend checking out the terminal the the microsoft terminal and the linux sub system for windows but either way let's go back over here to our terminal i'm going to type clear and i'm going to cd so i'm in my i mean i'm in my home directory right now i'm going to cd into a project data science directory that i already have created i'm going to type ls to list the contents of this directory and i'm going to make a new project directory here just for this video so that we can walk through a few things so i'm going to make make a new directory let's call it data science environment all right so now if i do an ls you'll see that we have this new data science environment folder here i'm going to cd change directories into that data science environment and if i do another ls you'll see that there is there's nothing in here currently all right and that wraps up the quick little terminal piece of setting up your data science environment and by the way we'll be using the terminal a little bit more throughout this video so if you've never used it before you'll get a sense for some of the things that we do with the terminal but you've already seen you know a little bit of it you can list out your directories and your files you can make new directories uh this is where you can launch python and get into an interactive shell this is where you'll run jupyter notebooks from etc so there's you know we'll talk more about things that you do with the terminal as we go through this video so next up let's get python installed and let's also talk about virtual environments and we're going to kind of do these at the same time because the way that i recommend installing python and your virtual environment manager is through this thing called miniconda so miniconda is essentially a smaller version of the anaconda distribution and the anaconda distribution is a pretty popular way to install python and a lot of key data science libraries that you use in python like numpy and pandas etc so anaconda is pretty big because it includes kind of everything all of the different scientific libraries that basically anyone might want to use so miniconda is a lot smaller but it includes the key things still that we want it includes python and it includes conda here which is the way that we install new packages and the way that we manage our virtual environments so i would recommend go ahead and depending on your system download and install the download and install mini conda for your computer and make sure to do python 3.7 that's the way or um pretty much everything has migrated to python 37 these days unless you're at a company that is still using python 2 x uh you know for some reason then that's pretty much the only reason you would want to use python 2 as if you're as if your company is currently using it but otherwise download mini conda for python 3 and get it installed now once you have that installed you should be able to come back over here to your terminal and i'm going to exit out and start a new one just because you might also have to do this to refresh your terminal after installing miniconda let's go back into our folder here that we were in and you should now have access to python and you can you can test that just by running python and seeing if it works and you should see python37 or python38 or something like that and also if you type which python so i have anaconda installed here you'll you'll probably see miniconda been python or something like that and the which command here just tells you essentially you know if i run the command python right now what is it actually going to run where is the executable where is the application that is going to get run when i run python and in my case currently python if i run that it will run this anaconda 3 bin python and this is an important note because of what we're about to do which is use conda to create a virtual environment so let's pop over here to google really quickly i'm going to google let's try google virtual environment virtual environment python want to find a quick so all right so this image is kind of small all right yeah this image is kind of small but it demonstrates uh it demonstrates what a virtual environment is good for so let's say that you download mini conda and you install it and you have a version of python installed now and let's say that you install some packages using miniconda into your main python installation so you install numpy and you install pandas and you install a bunch of other stuff from the dark corners of the internet that you found that you think is going to be helpful for your data science project or you know whatever whatever it is that you're doing and now let's say let's actually go back to the example from before that you go to a company and that company has some of their code in python 2.7 well now what do you do do you uninstall python on your system and reinstall python 2.7 well that's going to break everything that you've been writing in your newer version of python um so how do you you know how do you solve this problem here or let's say you've got two different two different packages like pandas maybe maybe you want to install two different versions of pandas you want to install like the really new version with some cool experimental features but you also want to make sure that you have a stable version of pandas to run some of your previous code with so that things aren't breaking how do you get around that without having you know like a hundred computers sitting around so you can install a hundred different versions of python and these packages on all of them well that's where virtual environments come in handy so a virtual environment is essentially just like this image shows two different versions of python installed in two different directories on your computer so it keeps them totally separate you can install different versions of python you can install different versions of python packages so this is definitely a best practice and i'm going to show you how to do that the easiest way right now so conda which you just installed with miniconda is the package installation and virtual environment manager so if we wanted to install a new package like pandas we could do conda install pandas we could do conda install numpy or whatever and if we want to create a new virtual environment we also do that with conda so the the format for that is conda create so you're creating a new virtual environment you do dash in and then we're going to give it a name here so we'll say data science inv so env for environment so you give it a name and then whenever you're creating it you can go ahead and pass it whatever you want to install so let's install let's install jupiter to start out with we'll install pandas to start out with and maybe we'll go ahead and install matplotlib and um seaborn which is another data visualization library that i like so i'm going to hit enter here and now conda is going to figure out what it takes to install these packages that i've just requested and it's going to go and install a whole new version of python into a different directory along with these packages all right so this says solving environment it's going to ask us do we want to proceed within installing these packages and you might be wondering you know like hey why are we installing like a million packages here i just asked for four packages so that's because these four packages each rely on other packages so let's take a look here so pandas might rely for example on pandoc and pandoc filters and matplotlib might rely on pi qt for example so you'll see up here that these are the things that we've asked to install and these are the things that are going to be installed because they are dependencies and here's a key thing as well you'll see this environment location this is going to create a new virtual environment under the ims folder the envs directory and then under data science in so this is creating a whole other directory where it's going to install python and it's going to install all of the files for these packages so let's go ahead and hit yes here so type y and hit enter and i will give this just a minute to install and then we'll be back all right so you'll see that this has now finished and it helpfully gives us this little command here so to activate this virtual environment we just use conda activate data science inf and this is the part where if you're using windows there's a little bit of difference between powershell and the c and cmd the command prompt um so i have had more luck in the past when i've had to do this with the command prompt but you know play around with it do a little bit of googling to see which one works best for you or try the whole windows microsoft terminal and the linux subsystem but okay so for now remember that if i type which python this will show me where this python is going to be executed so if i do conda activate data science imp and now i do which python you'll see that python has the the version of python that we're now going to run is the one in this virtual environment folder that we just created data science in bin python so if i run python right now you'll see actually that we have a different version of python than i had a minute ago whenever i first ran python in my my base conda environment i'm actually using python 3.8.3 now so i'm using a slightly newer version of python and if i import pandas as pd which is how you typically import that you'll see that in that imports successfully if i try importing let's try um requests so we didn't install requests and it doesn't come as part of the python standard library so i don't have access to it because we didn't install it but if i go back out so if i do conda deactivate now to deactivate my virtual environment i type witch python you'll see that i'm running my kind of my base installation python now so i think i have requests installed here i'm not positive but let's take a look yep there we go so you'll see that the not only is the version of python different because we're running python37 here and we're running it from this folder but also the data science libraries the python libraries that we have installed are going to be different i have requests installed in my base installation but i didn't have it installed in my data science m virtual environment so you can see that these are two totally separate python environments which is the best practice so anytime you start a new project i would recommend creating a new virtual environment like this so i'm going to go ahead and reactivate my virtual environment and we can go ahead and move on to the next step our code editor so code editors are definitely that's that's one of those subjects where people can get into fights about it because people have strong feelings but i have seen a lot of movement in recent years actually towards this editor not everyone is using it of course but a lot of people are using it i use it personally and i've used different editors before visual studio code so this is a great code editor it comes with a lot of functionality right out of the box and then you've got these extensions that you can use to install pretty much anything else that you want it's nice and easy to use very clean very uh very beautiful and kind of you know streamlined so i would recommend starting out with vs code unless you already have another editor that you prefer so go ahead and download and install vs code and then i will show you some of my favorite bits of functionality about it so after you have vs code installed you can go back over here to your terminal and you might have to re you might have to exit and come back in again i'm not sure but after that if you come into your folder here your project directory that we created and type code for vs code and then period for the current directory what you're doing is you're telling vs code to open up a new window inside this current directory to say hey this is my project i want to do some work on my project let's open up this folder so whenever i first open it up you'll see that there's kind of nothing nothing going on here so these tabs on the left this top one shows you the this is the file explorer so the data science environment this is the folder we're currently in i'll go ahead and right click here and create a new file just so that we have some file to look at so i'll call this uh you know test dot py for python and you'll see that it immediately recognizes it as a python script it pops open this little uh this this python logo right here here is our file down here in the bottom right you'll see that the syntax highlighting or the language mode is python and if we start typing things like import pandas as pd it does syntax highlighting for us so that is pretty cool now we've got some other things over here on the left you've got some search um this is for source control which we'll get to in just a little bit you've got kind of a runner and debugger and then this is the main place that i want to point you to right now this is the extensions tab so this is where you can install anything that you want that does not come with vs code right out of the box so for example i have kind of an additional python extension installed here which helps with some things like linting and debugging i also have a docker extension installed which gives me this extra little tab where i can check out my docker containers and see if they're running and if you don't know what that is you know no worries so and this also has various extensions for databases if there's a certain database that you connect to you can find an extension for it um all kinds of cool things there so you can browse around for these so let me show you a couple of my other favorite parts about vs code so if i go up here to terminal we can actually launch a terminal directly in vs code and not only that well let's see so it's automatically activating this other virtual environment for me right now which is controlled down here on the bottom left i am going to change this to data science imp so i'm going to change the default virtual environment for this directory and you'll see that vs code just popped up a little folder with a settings thing a settings file that basically says hey this is the virtual environment i want to use for this directory so it will automatically remember that whenever i start new terminals in the future let's actually let's just go ahead and try that really quick let's start a new terminal let's see if it does the right thing yep there we go so it automatically activates our correct virtual environment for us so we can have a terminal here and this is very helpful for if you know let's say i want to i want to type up some python here in a script and then i want to test it then down here on the terminal i can just do let's see let's take a look in our environment here or sorry in our directory test.pi so i do python python test.pi and it prints hello world so we have kind of all in one window we have a way that we can write code and we can also run it down here and you can also have i can i can open up ipython down here and have an interactive python shell open where i can test things out while i'm creating scripts so let's say there's something i don't know how to do or i'm confused as to what the results are going to be i can have interactive python i can have ipython running down here test out whatever i want to test out and then move it up here into a more permanent script now not only that i can go to terminal split terminal and now i've got two terminals going on look at that that's pretty crazy so now not only can i experiment with python in the left side but i can also run my python scripts in the right side and i can do anything else that you might want to do with the terminal over on this right side as well like like version control using git which is something we will get to in just a minute so these are some of my favorite features of vs code obviously there's a lot more that you can discover by exploring but if nothing else i would start out with this by opening up your project directory inside vs code have open your scripts and whatever else you want and then have open if you need an interactive python terminal over here and just a normal terminal on this side so i'll go ahead and exit out of this terminal exit out clear that and we'll keep this around for later usage oh one last thing that i want to mention before we move on is the command shift p shortcut so command shift p and if you're on a windows that might be control shift p for you but command shift p opens up this launch search function essentially where you can basically search for any functionality that you want so if i want to change the syntax um let's see i think i might be able to do that here sin syntax maybe maybe not sure if actually so i usually change the syntax down here in the bottom right but one thing that i that i use this for for example is let's say you've got some text and you want to change this to uppercase command shift p search for uppercase transform to uppercase boom there you go let's say i have some text i've got some text here you know this doesn't necessarily um let's you know we've got some bins we got dogs whatever this doesn't make any sense but apples and elephants and i want to sort these well does vs code have a sort functionality let me check command shift p sort sort lines ascending boom there we go and now all of our text is sorted so anytime you're looking for something and you're like huh i wonder if i can do this right now you know check out command shift p and take a look to see if it's in here because you can you know scroll through and see there's a ton of different stuff that you can do but all right and actually i lied one more thing that i'm going to show you before we move on and that is multiple cursors so if you click somewhere and then you use the on a mac it's the option key i'm not sure what it would be on windows but use the option key and click well now i've just created multiple cursors and i can type in multiple things at the same time so and i can even copy paste multiple things at the same time so i can do stuff like this or if i was you know typing some python and i wanted to turn this into a you know like some dictionary or something maybe i create a dictionary up here and then i create multiple cursors option click option click option click i'm going to select these four lines i'm going to use the double quote to wrap these in double quotes and i'm going to set some kind of key equal to this maybe just the first uh first three letters here or something like that and there we go with not that much typing i was just able to turn four different lines into actual code here by typing something on every single line so multi-line select is very very useful so you can do option click another way to do it is you can do command option and then like the down arrow for example will move your cursor down and do multiple selections you can also and this is this is probably too much information at this point but i want to show you just because it's cool you can select a word and then do command d and that will select the next instance of the word and then you've got a multiple cursor there so or let's say that i want to select this equal sign then i can do command d and now i'm selecting all four of these space equal spaces and then i can go here and you know edit stuff however i want to so those are a couple tricks that will save you a ton of time as you are coding in vs code all right just a few more things to go here to finish getting your environment set up and a few more things to walk you through as far as using all of this together so let's go back over to chrome here and we're going to talk about version control now so version control is essentially a way to there's a pretty deep topic here so you're going to want to do some research but it's essentially a way to keep track of each time your code changes so you know how you can save a document and that creates a little checkpoint or it you know make sure that your that your your edits get saved into the file version control is like that but a lot more powerful not only can you save your code like we might do with a file but you can actually take snapshots of it at any given point in time so that if you ever want to get back to like let's say this exact file then you can do that and if you're collaborating on code with a bunch of other people and you know like let's say i'm working on my function here and i'm i'm trying to write a function and you know someone else uh this is like sarah's function over here and sarah is trying to work on this we can both work on code on our separate computers and then eventually combine them together in the same file and version control is what helps us do that so i am going to recommend that you install git get git and then get an account set up on github so first things first if you google github install get there is this set up get so github basically has like a lot of really helpful resources for using git and just just to explain really quick what this is so git is the underlying software that does the version control git hub is a website where you can store your code that has been tracked using git so github is basically a way to help you track your code and store it online using git but git is the underlying software that you're going to be using on your computer to actually track everything so step number one would be to download and install the latest version of git so if we click on that link should be pretty straightforward you just download you know whichever version of these for your operating system and get it in stalled and then i would come to come back to the setup page and let's see set your username so you can set a get username here and then set your commit email address so do git config so these are things that you do via the terminal so i would just follow these instructions on github so download and install git follow these other two instructions and then go ahead and create a github account as well now once you have git installed and you've set up a get a github account if you come back over here to your environment the way that we start tracking our code in for this project is in a terminal we would type git init so we create and you'll see here initialized empty git repository and that essentially means what that basically means is hey git is going to start tracking your code changes now whenever you tell it to so get does not do anything unless we tell it to so let's tell it to do something so first things first i'm going to create two files that you'll typically find in most git reap repos or repositories that's the readme here and this is going to be something like this it looks like you know my data science environment this is a helper readme file with information about what is in this project and this is what will be displayed on github this is essentially just a text file that you can format so this this little uh hashtag pound symbol there means give me a big heading um and this is just a way of kind of writing up some information about what is in this repository so the readme is one file that you'll typically find in every git repository and the other is this thing called a dot git ignore now dot get ignore this is where you put any directories or any files of things that you do not want to track in using git so in this case i do not want to track this dot vs code directory because this just has settings that are relevant to my specific computer and to the fact that i'm using vs code and you know if i want to share this code with other people maybe they're not using vs code and they're not going to have uh let's see they're not going to have the same path here that i would for my environment so you know maybe i want to ignore this folder so what i do in here is i would just type dot vs code to say hey git ignore this directory i save it and then you'll notice here helpfully very helpfully on the left hand side vs code has grayed out this directory so it's essentially telling us hey your git is not going to track this directory now i come back over here to my terminal i have a i've initialized an empty git repository so we have the ability to keep track of changes now now i'm going to run git status to take a look at everything that's changed well we have three files that we have not started tracking yet so i'm going to add those files and i'm going to add all of the changes that have been made in them by using the period syntax here the period notation which just means add everything inside of my folder now if i run git status um so we haven't actually taken a snapshot yet we've just told git that we're thinking about taking a snapshot and that these are the files that we are thinking about taking a snapshot of in order to track their changes so changes to be committed committed here means basically taking a snapshot and now i type git commit and i'm going to add a comment message for this commit so i'm going to say essentially what am i doing well i'll just call this my first commit so i commit it now we have taken a snapshot of these three files at this specific time when they look exactly like this now the last piece of version control that i'm going to show you really quickly is if we want to store this code now and we want to store the history of all of these snapshots so like let's say you know that i make a change here maybe i'm going to add you know maybe mike's function and i save it so now i run a git status you'll see that test.pi has been modified so i'm going to stage it i'm going to get it ready to stick it to take a snapshot by adding it to the staging area and now i'm going to run git commit and i'm going to say add a placeholder function for mic all right and i've just taken another snapshot of our file here and if i run git log you'll be able to see two different snapshots these just these just uh list the um comments that we made here for the snapshots but we get to see you know when did when was the snapshot made who made the snapshot what was the message etc so i'll type q to get back out of here and the last thing we're going to do here with version control is push this code up to git lab or sorry git hub git lab is a is another another software so we're going to make sure that this code gets pushed to github so that it is stored safely online so if i were to throw my computer out a window my code would still be safely stored on github so let me go here i'll go to github.com i am going to go up here to the top right and create a new repository from that plus symbol the repository name let's create uh let's say demo data sci um project demo data sci project and i can say you know this is a demo data science project all right i'll leave this public for now that's totally fine you can also make it private if you want and then we are already created a readme and a git ignore so we are going to skip both of these i'm going to create the repository so this is a place where we can store our code that we just created on our computer so github tells you how to do this which is very nice we already have an existing repository so i'm going to copy this line here so git remote add origin and basically this just tells our terminal where to send the code to and in this case we're sending it to this link or to this git repository and then we use this git push this get push line here to push up our code to github all right looks like it pushed successfully if you're doing this for the first time you'll probably have to enter your password your username and password or actually maybe just your password here so you'll see that after a bunch of lines we just pushed up our code to github i come back over here i refresh my tab in chrome and here are my three files my get ignore let's go in there so you just see vs code which is the same as the file we have on on my computer my readme and you'll notice actually this readme section down here that's what this does is this readme dot md for mark down creates a nicely formatted little description down here on the repository itself so that you can uh give information about what the repo is let's actually go let's try to find something really quick um let's say maybe python pandas github so if we go to the pandas repository and scroll down you'll see that they have all kinds of stuff they've got a nice image up here they've got all these little badges they've got a description of what pandas is they've got um list of features and links they've got how to install it so you can do all of this kind of stuff inside of your readme dot md file and then finally we have our test.pi our python script here and you'll see that this is what we created so there is a ton more to learn about version control but that's version control in a nutshell how you get it set up how you start using it and this is definitely something that pretty much all data scientists are going to do and going to use is they're going to use git and they're going to use github or something like github but very very often github so i just got you started with it but i would suggest learning more and using it you know you you learn through doing so just uh just dive in all right and i believe we have one or two more things so first we'll look at jupiter notebooks and then we will i'll show you how i like to start setting up my project directory structure over here to keep everything nice and organized and then i think that we might be done we might be done so jupyter notebooks uh we installed jupyter when we created this virtual environment so if i type which jupiter you'll see that the jupiter executable the jupiter application here that we're going to run is inside of our virtual environment folder which is exactly what we want so the way that i run jupyter notebooks is you just type in jupiter notebook and then i like to use this little ampersand after jupyter notebook and what this does jupiter notebook has to keep running inside of your terminal and if you uh if you don't run it with the ampersand then this terminal essentially becomes unusable because jupiter notebook is occupying the whole thing it's running inside this terminal if you use the ampersand this creates it as a background task or a background job and then you can keep using this terminal for other things like version control so i run that jupyter notebook launches over here on my other screen so i'm going to drag it over and here we go this is what jupiter notebooks looks like when it launches and from here you can go over here to this right hand side click new python 3 notebook and now you have we can say my new jupiter notebook you have a new jupiter notebook where you can write code like print hello world and you can also create markdown just like the markdown in github so you can say uh here you can say you know my project here is some information about my projects and then you can do python code and uh not not very complicated python code that we're doing right here but you know just to get you started to get you into it so i'll go ahead and save this notebook here close out of it you'll see that we have a new notebook created right here i can click on it and shut it down it's not necessary as long as you save your notebook but you know you can basically just say hey i'm not using this notebook anymore let's go ahead and and shut down the python kernel that's running inside of there so your version of python that's running in there i'll go ahead and come back over here to vs code and my terminal you'll see a bunch of output from the uh on the terminal from jupyter notebooks talking about what jupiter notebooks is doing but if i hit enter if i hit return here you'll see that this terminal is still usable and that's what the ampersand that's what the and did at the end of jupiter notebook up here let's make it so that this terminal is still usable now let's do a couple of other things to wrap up here and to get our project directory looking good i'm going to shut down jupyter notebooks and the way i do that is i'll type jobs here to see that jupyter notebooks is in fact running and then i'm going to type kill and then the percent sign so the percent sign and then one and that is going to kill this kind of this first job here and you'll see okay the notebook is shutting down the job is killed so if we type jobs there are no longer any jobs running so jupyter notebooks is shut down if i come back over here to where our jupyter notebooks was running you'll notice localhost this just means it's running on your computer at this port so if i click refresh well we just shut jupyter notebooks down so this makes sense that we can no longer access it all right i'll come back over here let's check our get status so git status well we just added a new jupyter notebook so that's pretty cool we also added this checkpoints directory this is another thing that i don't necessarily want to track in version control so i'm going to add this directory to my git ignore now i run git status you'll see that i've modified my git ignore file and i've added a new jupiter notebook file let's just add everything by using the get add period well actually i'll add these one at a time so i'll add git ignore i will commit that with a message that says add uh add jupiter notebook checkpoints to git ignore i'll run git status again and you'll see that okay we just we just took care of snapshotting our get ignore file but we still need to track and snapshot our notebook so i will add our notebook here run a little get status we have staged that it's looking like it's ready to be committed so git commit and we'll say first commit of new notebook and i'm going to go ahead and get push origin master to get my code onto github let's go ahead and come over here and uh let's just take a look i think actually we have it already open over here yeah we do all right so i'll refresh my repo here and you'll see that there we go we have our new notebook up here so that is nice and tracked using git and github all right one last piece of information here and then i think we will call it quits for the day and that is how to set up the directory structure for a new data science project so i will show you one of my favorite resources and that is cookie cutter data science so if you google cookie cutter data science you'll find this page here which is basically just as it says a logical reasonably standardized but flexible project structure for doing and sharing data science work i do think this is a pretty nice project structure so i use pieces of it i typically don't use kind of the full thing here but i use pieces of it and i'll show you the main pieces that i like to use so this is showing the directory structure as well as what the directories are used for so for example we have a data folder up here which is used unsurprisingly for your data you've got the raw data you've got your data from third-party sources etc you have a notebooks folder so this one is going to be used for your jupyter notebooks and then you'll keep a lot or you know almost all of your python scripts you know the non-jupiter notebook python code that you write inside of the source folder src and your source folder is broken up into different subfolders based on what the code does so data features models visualization etc so let's come back over here to vs code and i will i could do this either in the terminal or over here on the left maybe let's do a mixture so i'll right click i'll say new folder let's create the source folder and let's move the test.pi into the source folder there we go so our source folder now has test.pi in it now i'll come down here to my terminal for this next one let's make a new folder or a directory let's call it notebooks and you'll see the notebooks folder pop up here on the left and now let's move mv the jupyter notebook into the notebooks folder and you'll see that up here on the left the notebook just popped into the notebooks folder here and as a very last thing let's just make dir data so let's make our data directory and our project structure is looking pretty good here pretty good and you know if i want to create let's just create a little placeholder file you know we don't actually have data here but you know if we had a csv file maybe like mydata.csv so you'll see now that these folders are green which means that they have been changed since we have taken a snapshot with git so if i do get status you'll see that it thinks that we've deleted a couple files we haven't deleted them we've really just moved them into these untracked folders so let's go ahead and add everything and then get status all right well now git is pretty smart actually it's pretty smart so it just it just uh told us essentially that it thinks we renamed or we moved this file into this folder which we did so that's pretty smart and we also moved this file into this folder and we're also going to track our data well we actually usually don't want to add our data to version control because data can get pretty big so i'm going to add our data folder to dot git ignore but we have a problem and that's just that we we added our data folder already we already staged it to be snapshotted so let's go ahead and do a git reset so i'll do a git reset and that unstages everything so now if i do a git status you'll see that our data folder is no longer untracked because it's being ignored by our git ignore so if we add everything and then run a git status you'll see okay we renamed this we renamed this and instead of our data folder here we see that we modified our git ignore file to include the data folder here so i will commit this and we'll say create a new notebook and organize the project i'll push this on up to github come over here to github take a look and make sure that everything got organized there it is we're looking pretty good and with that i think you now have a really solid working data science environment so this is what i use on a daily basis i use i use zish as my terminal i use vs code to do my code editing i create conda environments and install things there i use git to do my version control i use jupyter notebooks to do code explorer to do like data exploration and modeling and things like that and i use this cookie cutter data science project directory structure to keep everything nice and structured so you now have a a beautiful data science environment that you can use and not have to worry about you know a billion different ways to set this up so i think you're in a really good place um i hope that this really helps you get started with data science and i hope that you have fun learning because you know data science just like any other technological field these days is constantly evolving there is always something new to learn and you know we're going to be learning new things for the rest of our lives so enjoy that process and i hope that this has helped to get you started all right thank you so much for being here hope you've learned something and we'll see you later
Info
Channel: Project Data Science
Views: 39,472
Rating: undefined out of 5
Keywords:
Id: cn7CnFIQUBo
Channel Id: undefined
Length: 63min 52sec (3832 seconds)
Published: Tue May 26 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.