How to Use Git in VS Code for Data Science

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is very awesome so this is a feature of git lens all right welcome to a new video my name is dave and my goal is to help you level up as a data scientist in today's video uh we'll go over what git is and how to use it in vs code for data science here's what we'll cover in this video we'll go over git for data science uh we'll set up a project uh i'll introduce git lens which is a vs code extension to basically supercharge kit i'll show you how to create a repository and then we'll go over some basic git operations within vs code so this will not be an in-depth guide on how to use git but more a guide on how to use git within vs code all right so for those of you that don't know git is a software for tracking changes in any set of files usually used for coordinating work among programmers collaboratively developing source code during software development so basically we can keep track of changes in our code and it says that it's used for software development but of course we as data scientists we also write code so we also want to keep track of our changes and share this code with others why do we want to use git for data science first of all we can keep track of changes to our code we can test new features without breaking the original code we can refer back to all the versions of the code if needed and we can backup our files on the cloud using github gitlab or bitbucket and a good thing to know which is something that confused me in the beginning git is not the same as github so git is a version control system that lets you manage and keep track of your source code history and github is a cloud-based hosting service that lets you manage your git repositories so these two are different so you can use git to also track changes locally on your machine without having to push them to the cloud to to get it and that's something i will show you later in this video and get also makes it easy to collaborate with a team by sharing your code and also keeping track of changes seeing who did what so when i started out with data science i used to do all my projects in jupyter notebooks and i would just create code and then i had something working and then for example i wanted to change something or i would add a new feature or add a new function for example and then all of a sudden everything broke and you're like oh what did i do how do i refer this back and hit command c command c command c within each of the different uh notebook cells and then you come to the conclusion like i really messed up my code and it's not working anymore this is where git can help us before we make any major changes to to code that is already working we want to back it up with git make sure we track the changes so now let's hop into vs code and i'll show you how to setup git and how to use it within your data science projects all right to follow along you need two things first you need visual code studio installed on your system so if you don't have that yet i would highly recommend that you install it it's free and explore it uh it's an amazing tool for data science i also have a video on how to set up vs code for data science which i will link somewhere up here so that's the first thing second you need git installed on your system and i will leave this link in the description basically here you can see how to install git on either linux on mac or here on windows so make sure you have fuse code and git installed let me open up fies code what i did is i created an awesome new data science project the folder is here on my desktop it's using the the project structure that i use for my data science projects it's currently empty it's just an example but this is the folder that we will be using so let me just first open that up in vs code so i'm here in a new blank vs code document i'll open a folder this is the folder that i want to import so my awesome ds project open it up i'll allow that and i'll allow that yes so here we can see that we have the folder now imported within a vs code project what i will then do these are just some basic procedures that i follow whenever i start up a new data science project is i will save this as a workspace what this will do i will save this in the same folder what this will do this basically will create a code workspace file that can store all the settings of this vs code project so now whenever i want to open up this project again i can just open up this file or go to recent and then open up this file and i will have all the folders in here now another thing that we have to set up is an extension called get lens so you can first of all go to the extensions by clicking on this icon over here and then search for get lens and this is the one we need so i already have an installed uh but if you don't have it installed it will show a big install button over here so you can just click on that it's free it will install in a couple of seconds you might have to refresh fs code but it will let you know that and i will later show what this does but as i said this will basically supercharge kit within vs code so now we're in this project and let's say for example uh we're here in the the data stage and we create a data prep file for example uh let me do my imports we need a python interpreter so just select my base anaconda environment have some imports run that all right so fire up interactive jupiter session if you want to know how to do this i also show that in the video about how to set up vs code for data science again link will be somewhere up here and now we're ready to go we'll start with our project so for example we'll start as we data scientists usually do and we start off with a data frame and this will be a very boring data frame but let's just for the sake of demonstration create a data frame like this beautiful so we have a data frame and all the things that we just did in our file we want to make sure that we keep track of those because later if you want to change something for example and we want to refer back to this awesome data frame we want to know that these changes are stored somewhere so how do we do this well on the left over here we have the source control icon within vs code that we can use so on a new project it will say there this workspace currently doesn't have any folder containing a git repository and that's because we have to create one so vs code makes this very easily easy we can just hit the initialize repository button here and what this will do and to show you i will open up this this folder and let me just quickly show the hidden files so i do that on mac by hitting a command shift and the dot on windows this is different but this will show you the the hidden files and as you can see there's already a get ignore uh file in here because that's in my default project structure but there's not get folder here yet so that's basically what this button over here does it says initialize repository so if we do this and then we have to pick the folder in which we uh we want to initialize it and i will use the same folder so our awesome ds project and so now if i go back what we can see is we now have a get folder here and basically this folder is used to keep track of all the changes so git will do its thing here and this is normally this is hidden so if i if i hide the hidden files again you won't see it here but just so you know this is what's happening in the background so we create a hidden file everything that we do is tracked over there so now we have a local uh repository local workspace that we can commit the changes to so we just created the first file and what we can now do is if you're familiar with git we can do a commit and also leave a message for that so we'll do just first commit and then i'll hit the commit button we can say yes so now we've committed the changes to our git file and we can now keep track of those changes as well now i can also show you what gitlands can do for us so we've created a python script and we initialized the data frame here and then we committed these changes and then we added another line so this could be another feature this could be a change to your code and what we can now do is by close out this interactive session i can click on this icon over here and we can preview the changes this is very awesome so this is a feature of git lens after installing gitlands these icons up here will become available so here on the left you can see our a previous file that we've committed and here on the right you can see our new file so what we've added let me give you an example we'll run it like this so everything is working and then for example we change some things around and i want to add a four here but all of a sudden i make a mistake and i i i delete a bracket alright so we we change this we close this out and then the next day i come back and i want to run my code and all of a sudden we have an error and we're like what yesterday this was working fine why why is it not working anymore this is obviously a very obvious error but believe me this will happen all of the time you're working on a project you come back the next day or even an hour later and things break down because you accidentally hit hit the backspace or you accidentally typed a comma somewhere and things just break it will happen what we can now do is we're like oh how do we resolve this so now i go back and i can see oh wait we have some changes here and what i can then do is just take this line over here for example just adjust it and now i adjust it manually like this and then we're good to go again so let me save this close this out and here we're good to go again that is how you can track changes locally another thing that i notice is here over here you every line that i now um click on you will get a little notification like okay you made changes here for uh four minutes ago at times that can be convenient to have it there but i also find it a little bit distracting to have it always on and i believe it's the it's the default yeah so i believe it's a default so how you disable that is you uh hit command shift b or control shift b on on windows and what you can then do is you there's a command get lens toggle line blame so it's the one over here you can also search for it like toggle line blame and if i hit enter as you can see that that will uh disappear so that's one setting that i think by default is on and i change that to off that is how we can keep track of local changes but now let's say for example you want to push this to the cloud you want to push it to to github to uh share it with your colleagues for example to work on a project together how do you do it so we go back to uh the source control icon here in the left menu in vs code and first of all it says oh we have new changes so let me just for example we we messed around but i'll create the second data frame uh we'll call this c d all right so perfect new file it will change and then we'll hit the commit and i'll say edit uh second data frame different and we committed yes that's all right all right now that we've committed everything we have this other big button over here that says publish branch and this latch lets us push our code our repository rather to to github to make it available for our colleagues or just to store it in a cloud to be more safe with with our code what happens when we click publish branch first of all you can define a name of the repository so i write my uh my folder names in a way uh in the same way that i would write my my github repository so i'll leave that at this so awesome ds project and then what we can check here is we can publish to uh get a private repository or a public repository so this is up to you typically for work most of them will be private of course i've already configured uh my github infuse code to be connected to my github account probably if you do this for the first time you have to authorize it and then log in and give permissions etc but should be very straightforward so i'll show you how to publish it to a private repository so uh i'll select this one hit enter and what this will now do here we get a success message and we can open this on github so let's go to github and as you can see we're here on my github and we have our repository in here with all the files and also all the commits so as you can see we have our first commit and also here the message added a second data frame so we can check this out here and here we have the data prep and as you can see we have two beautiful data frames here so that's how you publish uh your code to github now lastly i will show you how to work with git branches using vs code so let's say for example you're in an awesome project and you want to add a new feature so for example this will be new feature that you want to implement and you don't want to continue on the master branch but you want to continue on a new branch to make sure that you can keep things separated just to make sure that nothing breaks so let me show you how to do that so for example just for the sake of demonstration we add a new row to this data frame awesome and we want to publish this to a new branch so how we do this is we go to version control source control again we come down here to the bottom where it says branches and as you can see here we just have the master branch which also we can check here on github or on the master branch and now we want to create a new one we can do this by hitting the plus icon then deciding on a name for our branch so let's call this development and then we can either create or create branch and switch to it to it so i will do that i'll create the development branch and i'll switch to it so now everything that we commit will be committed to the development branch you can also see that here enter to commit on development um so this will be our new feature we commit it hit yes it will commit it to the development branch and then what we can do we can push it we can publish it and what we can now see if we go oh it already refreshed now we have a master but we also have a development branch and data and then our file you can see that we now have our new feature with our new row in our awesome data frame and now we're on development if we check out the master it's still the original one with the the less awesome data frame with only three rows so um yeah that's how you work with branches if you then want to switch for example back again to the master you can do that by clicking over here and then now we're on the master again and yeah if you wanna merge these these files together you would do that just like you do with the regulator project so you can uh you can do this just from from github you create a pull request and then you can merge uh your files so yeah that's what i wanted to show you in this video how to use git within fies code for data science i hope that you've learned something and i would highly encourage you to start exploring it and using it in your next data science projects now if you like this video i would really appreciate it if you like this video and subscribe to the channel i'll be making more videos related to data science machine learning python basically everything to help you level up as a data scientist um yeah so if that's something you're interested in you should definitely subscribe see you next time
Info
Channel: Dave Ebbelaar
Views: 9,110
Rating: undefined out of 5
Keywords:
Id: jrsIHDGHfRc
Channel Id: undefined
Length: 16min 36sec (996 seconds)
Published: Thu Jul 14 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.