Standout as a Data Analyst with THIS TOOL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
recently i was going through an interview process for a side consulting project i wanted to pick up this interview process required collaborating with others on a small scale project for it i had to utilize git via github to collaborate with some of the members of the interview team and then show my final deliverable i eventually found out that i was hired for the project and when i solicited feedback i found out they actually encompass many data analysts that don't know how to use git with this video i'm hoping you're better informed to use git in your future projects what update nerds i'm luke a data analyst and my channel is all about tech and skills for data science and in this video today we're going to be going over git and also github and how this can be used in your role as a data analyst in data science git in general can be a very overwhelming topic for somebody new to this so for this we're going to be just focusing on the core basics to get you started as many of y'all are working on capstone projects from things like the google data analytics professional certificate or just a project in general i wanted to showcase a manner that you can be displaying this portfolio online with this newfound skill set of git i'm hoping that you are better set up for success so you can do like i did and collaborate with other teams on data science projects so let's get started first with understanding what is git and how github fits into this whole thing git is a version control system so if you've ever used something like google docs or microsoft word where you can track changes it's a very similar concept in that it's tracking the changes to a specific file but instead of only tracking changes in one single file it does this across an entire folder structure so you can have multiple different files you're working on such as python or r that it's tracking the changes in it even goes as far to track overall changes in the folder structure or the addition and removal of different files from those folders all these changes are recorded in what's called a git repository which is a hidden file inside of your main folder so you're probably asking yourself why do we need to track these changes when you're working in teams and everybody's working on their own machine with their different portions of the code whenever you go to bring back this code together you need to have a method that tracks all these different revisions so that way you can seamlessly integrate these changes together so how does github fit into all of this git itself is a software that you install and then it is used within your folder structure to track those changes github is an online platform where you can actually take your files and folders and that git repository and host it online so that way others can access it and you can collaborate with others when used in this manner github is called the central or remote repository and this is where the source of truth is for your files resides one can then pull this central repository into their local repository and work on the code or whatever it may be once your changes are done you would then push this back to the central repository to update those changes for a single person project this may seem a little bit overkill to host on github but it's actually good because now you have a location that you can share your portfolio but even more of a benefit of github is the fact that whenever you start collaborate on teams it provides a method to have a central repository so multiple members can access and work on code at the same time to show this benefit let's say you're working on a team-based project and for this the central repository itself is located on github and you want to make changes to this project so for this you have the central repository and you pull that repository onto your local machine so have your own local copy and you begin making changes to the project itself now because of these changes you have two different repositories let's say during this time also a second co-worker decides they want to make changes to this project they also pull it and begin making changes now we have three completely different repositories now let's say you're wrapping up with your changes from there you can then push your changes up to that central repository and from there you'll integrate those changes together and the central repository will be updated for those changes now when the other co-worker goes to update their changes to the central repository it's going to now have that version that you modified but because we had this version control system we'll be able to easily integrate these different changes and we'll have this new updated central repository with all of the changes so that's the benefit of having the central repository and the most popular option that i'm recommending here is github that was actually recently acquired by microsoft a few years ago because it's so popular but there are a number of other options that you can check out such as bitbucket gitlab and also sourceforge okay let's actually go into an example use case of using git and also github for this let's say you have some sort of project that you want to host on github so for this example here we're going to be using i had a recent video on i used a python file to collect data and count the number of mountain bike jumps i had i have that project right here we are going to go through the process of initiating it and then also hosting it to the repository on github here we are on my local machine and here is the folder itself titled mountain bike project and it has all the different files and folders within it that are part of the project itself which is just a jupyter notebook and then some data and some different charts so for this i want to initiate git to start tracking the changes within this project for this i want to open a command line at this folder itself and so for if you're on a windows machine you're going to be using powershell or command prompt for this mac you can either use terminal or you can use item 2. so i'm going to have a item 2 window open up okay so don't be intimidated with the command line it's definitely a skill you need to learn and be familiar with but all it is is now we are inside of the command line located at the folder itself so if i were to show the contents of this folder i would type ls and i can see all the different contents right here that match up uh what are in the folder itself so now we want to use git itself and to actually see if you have this software of git installed if you're on a mac you're just going to type git tac tac and then version and if you don't have git installed on your mac it will automatically install it but if you do it will tell the version if you're on a windows machine i'll include a link below and that's how you're going to go about installing git on your windows machine okay so we have get installed so i'm going to make a call to get which is the command is get itself and then from there i'm going to say git enit and that's going to initialize this project itself and start that git repository so it initialized this empty git repository and as you can see inside this mountain bike project folder we now have this hidden file which you can see it's sort of grayed out on my mac and that is a hidden final structure and that's what's actually tracking those changes so now it's going to start tracking these changes with any project itself you're going to need a readme file which is a markdown file so i'm going to go ahead and create that real quick we want to start tracking all the different changes we've had made within this folder itself so we're going to run a git add function so we'll do git add and then i'll do this tack a and this just says add all these different files to the staging area okay they're now added now we need to actually commit these changes or actually update it within our git repository this is what the changes we want to be so from here we're going to run git commit and we're also going to include a message on what we're doing so this is the initial commit so as you can see hey it has eight files within this uh folder right here and it's going to it create committed all these different changes to it so our local project and the git repository is up to date for all of our recent changes so now right we have all these changes they're done we're tracking it via git on our local computer we want to get it to github so that way we can display it to the world so go to your favorite browser and if you don't have a github account already go in and sign up for it now once you have your account set up you can now go in and you're going to create that repository on github itself so that you can now store your files on in order to create this repository go to your local page and from there navigate to the repositories and then you want to create a new repository and from there you'll name it public because you want to be seen by everybody and then don't worry about checking any of these different items right now and then from there we'll create repository okay so now we now have this empty repository on github itself and what's it's saying now is hey you can either create a new repository or you can come in and push an existing repository which we have in our case so you can actually just copy all of this code right here so i'll copy it and then from here we'll go back to the command line itself paste that code in and then press enter and what has happened now is this has now connected your local machine and the local repository you had to that remote repository which is located on github so let's go check it out on github now so i'll navigate to this repository folder and bam there is all my stuff so it has my charts folder my data folder my python file my readme i had updated it further for different information but it has all the different files and folders located right here now on github for the world to see now that we have our project hosted on github now what's going to come up is we need to make changes to this project itself so let's go over some key ideas and concepts behind git itself so whenever we initiated git using that git init all of the files were untracked and then from there what we wanted to do we want to start tracking it within git so from there we use that git add keyword to actually add it to start tracking the changes and this adds it into the staged area they are now in an area that we have all of our different files we may want to stage and we can then from there commit those changes and update our local repository to the most recent revision once we have made a commit that's usually after what follows this is usually you would push so we did that git push command to actually push our local repository to our remote repository now it's coming time like i said we're coming to we want to make changes to the project itself when we go in and make changes to any of these files it's going to go from that committed zone over to a modified area then once we have done making our changes we can then do that git add to then push it back into that staging area once we have everything staged that we want to commit we can then commit it back to the committed area and then from there push up to github if we want to around circular circle with the music the flow so let's actually apply this of me wanting to add a file to this project and then update it on the remote repository on github so the first one command that you need to be very familiar with is git status and this just tells what is the status of the remote repository for this we're on the branch main which we want to be and it's up to date there's nothing to commit so there's nothing really here glaring out it's just saying hey everything is normal from here i want to create a new python file we'll say it's a hello world uh python file and it's just a simple script that prints hello world so i'm going to create that new file and we can see that it was uh created right here and so now let's go back in and do that git status and see what's happening here and see we added this file and so now going back to what we previously learned it's an untracked file so we need to add this file so we'll do git add and then we'll do the name of the file itself helloworld.py once again let's run git status to see what's going on and we can see hey we have hello world is a new file and it's in that staging area it is changes to be committed so now let's actually commit those changes so git commit so we added these changes once again i'll do git status to see what the status is and it says hey we're still on that branch main but your branch is a head of origin main which is what is on github by one commit so we have updated work on our local repository that is ahead of that on github so we need to now push that to github so i'll go ahead and type in git push and bam we have now pushed that to github and we can go to github itself and i will refresh this webpage and right here is our new helloworld.py file on github bam so there you have it that is how you can use git in your own manner to build a project and then showcase that project to github and then if you have any changes how you can actually update your project as well for those that aren't as familiar with the command line like i went through today i strongly encourage you to start going through and practicing using this because it's such a good skill to have in order to access and utilize your computer to its maximum capability so as always if you got value out of this video smash that like button and with that i'll see you in the next one [Music] you
Info
Channel: Luke Barousse
Views: 58,136
Rating: undefined out of 5
Keywords: data viz by luke, business intelligence, data science, bi, computer science, data nerd, github, gitlab, bitbucket, git repository, git, version control system, version control, shit is complicated, data science tools, data science best tool, git for data science, how to use git, using git for data science
Id: aw14VK9sN2s
Channel Id: undefined
Length: 14min 32sec (872 seconds)
Published: Sat May 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.