David Baumgold - Advanced Git - PyCon 2015

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
(clapping) Hi everyone, my name's David Baumgold, I work for edX, I'm really excited about Python, I'm really excited about a lot of other things with web development and with software in general including git. I have my slides online, you can see them at bit.ly/git-pycon-2015 I have a lot of stuff to go over so I might be talking slightly fast so you can use this as a reference. So I'm assuming that if you're here then you know the very basics about how to use git, or really any version control system. You know how to clone a repository, you know how to switch branches, after you've edited files you can commit, and you know how to push and pull to whatever you're using as your central repo which, for most of you, is probably GitHub. I'm also going to be using some visual terminology in this talk, so what I have here is an example of a git repository and you can see that there are a bunch of commits and there are two different branches: there's the master branch and there's the feature branch. Now, one thing to note about this is that you can see in this diagram that in git, there's actually no such thing as a branch object. Branches are actually labels that merely point to commits. And over time, using various different git commands you can change where those labels point to in order to change how those branches are structured, and we're gonna come back to that later. Another thing I want to point out here is that it's typically easier to display a git repository with arrows pointing upwards, to represent the branches. That's not technically true, the commits in git actually have arrows going backwards so each commit knows what its parent is and that parent knows its parent and so on. So once you've pointed to a specific commit, you can find all the commits in the branch just by traversing that path one step at a time to go back in time. So as I said we've got a lot to cover so let's just get started with the preface. So this is git status. This is probably one of my favorite commands. If you've used git on the command line you're probably very familiar with it. If you've used git with a GUI client you might not be. Status is something that simply shows you the current state of your repository and your files on there. It's something that you can always run, it never changes anything about your repo, it never screws anything up, it just gives you more information and as a result I run it all the time. So this is git status when there's nothing interesting going on. Here's git status when there is something interesting going on. So this will tell me that there's a file that I've deleted but haven't yet committed, there's a file that I've modified, and there's an untracked file that I've added. So as I've said, this just gives you some information about the situation here. Another really useful command is git show. Now when you run git show, what that does is it shows you information about a specific commit. If you run it without any arguments, it's going to show you information about the commit that you are currently sitting on. So there are a couple of things I want to point out here. The first is that really long string of letters and numbers right at the top. That is called the commit hash. Git assigns a unique ID, or a hash, to every single commit in your repository. And what's really interesting about that is that the hash is calculated based on the parents, it's based on the contents, it's based on the commit message, and it's structured such that, realistically, no two commits, even across repositories, even across the world, are going to share the same hash. They use enough distinct pieces of information that you can reasonably be certain that a full commit hash is completely unique across all repos, across everything. So it is truly a unique identifier for any given commit. You can also see using git show information about the commit message, and you can actually see the full diff of the commit, as well. So you can see everything that happened in that commit. So that's status and show. Now I'm going to show you some more interesting things. How many of you have ever been in this situation? Where you're trying to figure out what the heck is going on here. So git blame is gonna help you to figure out what was going on in that crazy piece of code that you're seeing in front of you. What git blame does, is it's going to look for every line of a file that you pass it to, and it's going to find the last commit that touched that line and it's going to tell you what it is. It's gonna show you the commit hash, it's gonna show you the name of the person who committed it, and the date. And remember, once you have the commit hash, you can also pass that to git show, to figure out more information such as the commit message. So, if you write good commit messages, then people down the line, including yourself, can understand just what you were thinking when you ended up with this code in this crazy state. So the output of blame is a little bit overwhelming, and you don't have to understand everything about how this works, I just want to show you the structure of this. So here's the first five lines, that I've broken up for clarity. You can see the first section is the commit hash, as a shortened version, followed by the name of the file, followed by the person who committed it, and then you can see the date, the line number, and the actual contents of the file. And so as a result, you can see the full structure of the file and you can see how things changed, and which lines worked together. So in this case, you can see the last two lines in this section of five actually came from the same commit. So that will help you to understand which lines are sort of hunks that came together. So that's blame. Next up, cherry-pick. Now we're gonna start changing things. So how many of you have been in this situation? I committed to master when I mean to commit to my feature branch, and I need to move my commit to a different branch. Happens to me all the time. So in this case, we have this commit "J", which is on master, and we want to move it to the feature branch. So how do we do that? We can use cherry-pick to do that. So in this case, what I'm gonna do, is I'm gonna use git show to get that commit hash, and then I'm going to check out the feature branch and I'm going to use git cherry-pick in order to move that commit, or actually to make a copy. Another thing that I want to point out here, Is you can see that the commit hash is very very long, but when I called git cherry-pick, I'm only passing it the first few characters of that commit hash. Now the reason I'm doing that, is because git is actually pretty smart when it comes to references to commits. And if you provide the first few characters of a commit hash, but not the full thing, what it does is it looks up all the commits in that repository that it knows about, and it checks to see how many of those commits use those small number of letters and characters as a prefix. If there's only one commit in the entire repo that has that particular prefix, it'll just use that commit. And realistically, these things are so long, and so random, that generally, if you provide the first six or seven characters of a commit hash, that's enough to uniquely identify it. So that's why you often see hashes that are only six or seven characters long. It's just a shortcut. So what cherry-pick actually does when you run this, is it's going to create an entirely new commit, which is based off the original, with the same diff, and the same commit message. And it's going to add it to the branch that you specify. It does not delete the original commit. It keeps the original one around, because cherry-picking just makes copies. So the obvious next question is, alright, so now I've got this thing where I want it to be, how do I remove "J" from master so that it was like I never committed it in the first place? 'Cause I don't want it there. So for that, we're going to use a command called reset. So what reset does, is it resets the branch pointer, to point to a different commit in your repository. So here's an example where what I'm doing, is I'm going to check out the master branch, and then I'm going to call git reset dash dash hard HEAD caret. So HEAD is another reference, and that reference just means whatever commit I'm currently sitting on. So if you're on the master branch and you reference HEAD then that just means the latest commit on the master branch. You can also use carets to climb that ancestor tree, and find the parent of the commit that you're looking for. So HEAD caret means that commit's parent, HEAD caret caret means the grandparent, HEAD caret caret caret means the great-grandparent, and so on. You can just keep on adding more carets on there, and it'll keep on climbing back up. So, after I've done this git reset dash dash hard, the repository's gonna look like this. The master point is now pointing back at "F", which is where it was before. And "J" is still in the repository. It's kinda just hanging out in the ether there, and if you find that you've done something wrong with reset, or if you've accidentally lost a commit, it's still there. It's not actually gone. And I'll show you in a little bit how to actually get that commit back. One of the fun things about git is that although it gives you the freedom and the flexibility to change history, and to muck about with the way that your branches are structured, you never actually lose information until the garbage collector runs. So yes, git has its own integrated garbage collector. And until that thing runs, which is fairly infrequently, all of the commits that you've ever created and touched are still hanging out somewhere in the ether, and you can get them back. So you can change history, and you can do these things to modify commits without being afraid of losing it, cause it's still there for you. So that's reset. Next up is rebase, which is the command that probably a lot of you have heard about, and are very curious about. So rebase is the command for changing history. And I have to put up a warning here. Rebase is something that gives you a lot of power, and you have to use that power responsibly. As we've learned from Doctor Who, when you go around changing history, there's a high probability that you're gonna encounter some monsters. So I'm gonna give you some pro tips about how to do this and how not to do this. So in general, never change history when other people might be using your branch, unless they know that you're doing so. So an example of that is, if you've got a team of maybe three or four people. all working on a feature together, and you've got a feature branch, and you're not expecting anybody else in the whole world to be looking at that branch except for you and your team, then rebasing is fine, just let the other people on your team know when you do so, so that they can update their branch pointers to get the new history. You also never want to change history on master, or more specifically, never want to change history on master once it's public. Once it's published on GitHub or Bitbucket, or wherever you're hosting your code. Because people generally rely on master, and it's just very confusing when that changes. Generally, you only want to change history for commits that haven't yet been pushed. That's the best practice. There's always a trade-off between when you want to do this, and when you don't want to do this, if you've pushed something ten seconds ago and you want to hastily make a change, maybe that's OK? That's kind of up to you. It depends on the situation. So how do we actually do this? Here's an example of when we would want to use rebase. How do we do that? When you have a branch that is, you've created it awhile ago, and master has changed, and you want to get those updates. A lot of people use merge to bring those updates into the branch, but the problem with doing that is then you get these weird merge commits and you get the commits from master in your branch, and when you try to get somebody else to code review it, those changes show up in the code review, and it's just generally very confusing to try to explain, no no no, those changes aren't actually mine, they were there before. So you can get around this whole situation by using rebase. So what rebase actually does, is it finds the merge base, which is basically the commit that you originally branched off of, in this case, it's commit "C", and it cherry-picks all of the commits since that point, to where you want them to be. And it reassigns the branch pointer. So the branch that you were working on now has a new base off of master. That's where the term rebase comes from. The branch has been re-based. And that's really all it does. It's just a series of cherry-picks. So when you understand the architecture behind git, these things become much less scary, and much more understandable. So let me show you how you would actually execute that. So I've got this feature branch, I'm going to check it out, and then I'm going to say git rebase master to say I want to take this branch, and I want to rebase it on top of the latest version of the master branch. And it's going to rewind back to that merge base, as I was saying, it's gonna find the commits that were on your branch, and it's going to apply them, one after another, onto the latest version of master. So after you've done this, as I said, you have changed history. And because you've done that, git is going to show you some interesting warnings and output which might be kind of scary at first. So if you run git status, you might see a message like this. Which says that your branch and the origin have diverged. So what does that actually mean? If you were to take a look at what these two different repositories look like right now, it makes a lot more sense. Your local version has a completely different version of feature branch, than the remote version does. And in effect, there's no way to get directly from I, which is the commit on the remote, to I-prime, which is the one that we've rebased. You have to sort of go back in order to go forward. And as a result, git treats that as the branches being diverged. So what do we do? If you try to push, it's also going to -- at first, it's just going to say no. So it's going to say, there's a change of history here, I'm not sure that this is actually what you want me to do. And so it's going to reject the push, unless you tell it that you really, really want to do this. And in order to do this, you need to use git push dash f And the "f" just stands for "force". And it's going to do what you tell it to, but it's going to let you know, hey, this was a forced update, I'm doing this sorta-kinda under duress. So just be aware, history has changed. So git will allow you to change history, but it's very careful about letting you do so, and it wants you to be sure that you know what you're doing. Which is smart. In addition, when you rebase, sometimes you get conflicts, which, you'll get this sort of big scary error message, but the thing to note is this line that says conflict. And those conflicts can actually be resolved just like merge conflicts. If you know how to resolve a merge conflict, you know how to resolve a rebase conflict. So the first thing to do, always the first thing to do, is to run git status, and that will tell you what the situation is, it will never muck anything up, and in this case, it'll tell you that you're in the middle of a rebase, and there's this thing, there's this file that has a conflict. If you take a look at the file, you can see that inside the file, it looks exactly like a merge conflict. And you can resolve it with your normal tools. And then once you've done that, the next thing is to resolve it, and tell the rebase to move on. But it's slightly different from merge. With a merge, you'd create a new commit, and just say, let's go. But with this, you can see with git status it says you want to fix the conflicts, and then run git rebase dash dash continue If you're in the middle of a rebase, and you get these conflicts, and you're having some problems and you want to start over, then you can just use git rebase dash dash abort and it'll put you right back where you started, no worries, everything's fine again, and you can retry the rebase later, if you want to. And you can also get cherry-picks when you're doing, you can also get conflicts when you're doing a cherry-pick. So you'll get a message like this, and again, git status is your friend. It'll tell you what the conflicts are, and then once you're done resolving them, you want to run git cherry-pick dash dash continue If you decide it's not worth it, then you can run git cherry-pick dash dash abort and you're all set. Alright, so that was rebase, with a quick throwback to cherry-pick. Next up is the ref log. So if you're changing history, the ref log is going to be your best friend. Because the ref log is going to save your ass, I guarantee it. So when you run git log, it shows you the commits on your branch in ancestor order, which means you're going back from parent to parent to parent. When you're running reflog, it's the same basic concept, but it's going to show you the commits in the order that you last referenced them. So you see all these crazy arrows on the diagram, and you're wondering, what does that mean. Let me give you a concrete example. So I screwed up, I did some kind of a rebase or a cherry-pick or something, or I did a reset, and I lost the commit that I was looking for, and now everything is kinda, just, wonky. So I want to get back to the way that things were. But I didn't write down the commit hash. So what do I do? reflog will help you out. So if you run reflog, what it's gonna do, is it's gonna show you the last few commits that you've been touching lately. And it'll show them to you in order, with the action that you've been doing. So you can see with this history, you can see that the last thing that I did was I tried to do a rebase, and I aborted it. And before that, I made a commit. And before that, I did some checking out to switch between various different branches. And before that, I did another rebase, but this one I finished successfully. And you can see that each one of these lines has a commit hash next to it. So I can find the commit hash that represents the state of the repository from before things went wonky, and I can just go back to that. So once you find the commit hash that you think is what you want, you want to check out that commit hash, take a look around at the files, make sure that everything works the way that you want, this is just sort of checking it, and then you want to reset the branch pointer back to that commit. And you can use reset dash dash hard to do that. And poof, just like that, nothing's wonky anymore. It's back to the way it was. Alright, so that's the ref log. Now I'm gonna show you some more cool things that you can do, now that you know how to change history, squashing and splitting. How many have been in this situation? I just made a commit, and ugh, I forgot to include that file that actually has some really important changes. So you can use git commit dash dash amend to actually amend the commit that you just made, to roll in any other changes that you need to make. So this is going to make a new commit based off of your most recent commit, but with the addition of any other changes that you've just used git add or git rm to add into your situation, and it's going to replace the topmost commit with that new commit that has everything all rolled up into it. So you never have to have any more "adding a missing file" or "fixed a typo" messages. You can just amend your commits. You also can do this with situations where you have a bunch of these commits back in history, it doesn't have to be the latest one, but for this we're going to use interactive rebase. So for this, you need some place to start. So in this case, we're gonna look at the last five commits, and we're going to use HEAD tilde five. which is exactly the same thing as doing HEAD caret caret caret caret caret, it's just shorter. It just refers to five commits back. Git is then going to open a file in your text editor and ask for further instructions. So this is what that file looks like. It's actually not really a file, it's more like a user interface. So you can see over here, there are the commits that you've requested, down at the bottom there's a list of instructions, and on the left side there's a list of actions that you can take on each of those commits. So in this case, you can see there are these two commits after the "added a widget" commit that were basically just "oops" commits and we basically want to roll those in, so it looks like we never made a mistake. You can be a perfect coder when you can change history. So what we're gonna do is we're just gonna change those two "pick"s to the word "squash" which allows us to meld this commit into the previous commit in the list. And then you're going to save and quit your editor, and it's going to immediately re-open asking you for a new commit message. And it's going to provide you with the commit messages from the last few commits that you were squashing together that you can use as a template, but you can put in whatever you want. So you put together your commit message, you save and quit, and then git will apply the changes that you requested, it'll squash those commits together, and everything will be great. And as a reminder, doing this does change history. Even doing the commit dash dash amend, that changes history as well. So all the previous warnings about doing that apply. Be careful. I also want to go over splitting commits. So when you have a commit that is really too big, and you want to split it into lots of smaller ones, you can use rebase interactive for that, as well. So let's say we have this commit here, that middle one, where we did a bunch of things, which is not really a great, useful commit message, and probably has too much stuff in it. So what we're gonna do is we're gonna change from "pick" to "edit", which allows us to use this commit, but change it around a little bit before it's actually rebased. So once we've saved that file, then it's going to rebase all the commits up until then, and then it's going to stop. And it's going to say, alright, you've got as much time as you need to make any sort of edits that you want to this commit. So once we're there, we can pop off that big commit, which has already been rebased, so we can just say reset HEAD up, and I'm not using dash dash hard because when you leave that off, it leaves the changes in that commit actually applied on your file system. So all those changes are there, and if you run git status, you'll see all of those unchanged files, untracked files, modified files, all ready to be committed. And now that you have that, you can just add them one at a time, or use git add dash p to do a sort of add by patch, and craft the commit messages that you want to. So you can turn one big commit into several smaller ones. And you can write commit messages for each one, that are more descriptive. And then once you've done that, then we just want to continue the rebase. We want to tell git, OK, I'm done editing this commit, and now we can just move on with all the things that we were doing before. And then it's going to apply any other commits that are necessary, and then it's gonna be done. And you can take a look at git log, and admire your freshly cleaned-up history. Alright, so I'm almost out of time, but there's one more command that I want to go over, which is called bisect. Git bisect is really cool because it helps you to discover what caused things to break. And in particular, which commit caused this breakage to happen. So if you've been, if you've suddenly discovered that something is broken on your production site, and you're not sure how long it's been broken, and you're not sure what's causing it to break, bisect is gonna be your best friend. Because what bisect does, is it helps you to find the commit where that changed. So to use this, you need three things: you need a test to determine if it's broken or not, you need a commit where things were working, in the past, and you need a commit where things are broken, which is usually going to be, you know, the tip of master, if you discover suddenly, oh, this is broken, and we don't know how long. So what bisect will do, is it will actually use binary search, to search back through time, and find the commit where things went from good to bad, and it'll do that very efficiently. So the way you actually use this, is you run git bisect start, to tell git, OK, with going into bisect mode now. You want to find the broken commit, and you want to tell git, OK, this is the bad one. And then you find the working commit, and you tell it, OK, this is the good one. And now, once you've done that, git is going to find a commit in between those two, and it's going to check it out for you, and it's going to ask you to determine if that particular commit is broken, or if it's working. If it's working, you just run git bisect good, If it's broken, you run git bisect bad. Either way, you're gonna have more information about where it changed from good to bad. So for example, if you have a continuum of commits, where you have a broken one, and you have a working one, git's gonna ask you to identify one in the middle. Is it good, or bad? If it's broken, then we can cross off half of the commits in that history, and test next on the other side. And if it's working, then you just cross off the other half. And you just keep on going recursively, until you find that particular commit where things actually changed. It's really fast, and it helps when you have a lot of history going on, and you need to find, you know, what actually caused this thing to break? If you have an automated test, it works even better. There's some documentation that you can find to determine exactly how to write a script that git will understand that uses UNIX exit codes to determine if it's working or broken, and then you can just pass that script to git. And git is going to test commit, check out a new one, test that commit, check out a new one, and keep on going until it finds the one that we're on. And that's gonna be a really quick way to find where things broke. Alright, so we're about done, but there is so much more stuff about git that you can learn, git has an integrated help system, the git website has some great documentation, and, any questions? clapping ASST: So if you have any questions for David, you can line up at the microphone that's in the middle of the room here down the back, we'll just wait a moment to see if there is anyone. DAVID: I just want to add, as I said, there's so much more in git, if you just want to talk about your favorite particular git command, I request that you not, and just let the people who have actual questions about what I went over ask them instead. Q1: So that was an amazing talk, thank you. I just wanted to ask, is there anywhere we can get a copy of these slides? DAVID: They are actually posted online and I have that that on the very first slide, it's bit.ly/git-pycon-2015. Q1: OK, thank you. Q2: I have a question if you have a suggestion on how to manage multiple repos within git itself, cause if you start nesting them, you run into other problems. DAVID: Can you describe more what you mean? Q2: So if I have multiple projects in git, and they're all being managed, is there a good way to manage all of them? I've heard people say, start doing set modules, or other things. DAVID: Oh, so you're talking about submodules and stuff. Q2: Yes. DAVID: Yeah, git unfortunately does not have great support for embedding one project inside of another. Submodules work, kind of. Q2: Kind of, so. DAVID: Other people have put together other projects to make that work better, I've heard of git subtree, which apparently is really good. I've never used it. Unfortunately, I can't really give you more information, sorry. Q2: Alright, thanks. Q3: Do you have any advice if you're in an organization where you're the only one who does rebasing? (audience laughs) DAVID: Documentation. Q3: (laughs) OK. DAVID: Describe to people what you're doing, write it down someplace where other people can see, give them directions on how they can rebase if they want to, and when you do do a rebase, be sure that nobody else is going to be screwed up by it. If they do get screwed up by it, make sure that you can point them to that documentation, and say, see, this is what you need to know. Q3: I've seen that I've gotten messed up by their merging. So develop gets merged into a branch, the branch gets merged into develop, and then somehow when I rebase, things get weird, in ways that are hard to explain. DAVID: Yeah, rebasing merge commits does work weirdly. I haven't figured out a good way to do that. I would say just, you know, do the best you can with educating your coworkers, your friends, whoever you're working with, and if they do introduce merge commits, then maybe rebasing is not the situation that you want to go with. Q3: Tear. OK, cool. ASST: Can you please make sure you get nice and close to the microphone, so that everyone in the audience can hear you please, so, even closer than that. Q4: Hi, what's the difference between git pull and git fetch? DAVID: git pull and git fetch? Q4: Yes. DAVID: It's actually pretty simple. Git fetch is the operation where git contacts another remote, like going out to GitHub, and downloading information from that remote. It's just a way of saying, hey, tell me what's going on. But git fetch does not change anything. All it does is request information. When you do a do a pull, it does a fetch, followed by either a merge or a rebase, depending on how you've set it up. That's all it does. Pull just does fetch, followed by merge or rebase. That's all. Q5: I was wondering, when you are rebasing onto master, and let's say, you have a lot of commits that you're trying to rebase, sometimes if there's a merge conflict somewhere, I've had a situation where I have to go through every commit and fix every single merge conflict. Is there a faster way to deal with that situation? DAVID: So there are two things that I can suggest. The first one is the sort of simplistic thing, which is, if you're having to resolve it with every single commit, then you can just squash your commits first, so that you only have one commit, in order to resolve it only once. And that works, as long as you're OK with squashing it down. If you're not OK with that, I know that git actually has a tool built in called rerere, and I believe that stands for reuse rebase resolution. So you can use that tool to tell git, alright, this is how I want to define this resolution, and it will apply it for the rest of the commits in that rebase. I've actually never used it before, just cause for me, squashing commits works fine, but you can look up the documentation on how to use rerere. And it's kinda fun to say. Q5: Thank you. Q6: Hey, great talk, thank you. You kinda dove in and went straight to using git reset dash dash hard. Can you just detail what the difference is between a regular git reset and what the dash dash hard is actually doing? DAVID: Sure. So when you do git reset without any arguments, it's going to pop off the commit, but it's going to leave the changes on your disk, so that you can create a new commit with those same changes, if you want to, or break them up into several smaller commits if you want to. Basically, just reset will simply remove the commit, without actually changing the files on your filesystem. Git reset dash dash hard will also change the files on your filesystem to make them exactly match that commit that you're pointing to. There's also dash dash soft, which is very similar to without any arguments, but what that will do is, I believe it has the changes not yet added to your staging system, so that you have to add them first or something. I'm not clear on exactly the details there. Q6: Yeah, it's always confusing. OK, thank you. ASST: Unfortunately, we don't have time for any more questions. If you do have more questions for David, come up and ask him, he'll be here for another five or so minutes. DAVID: Yep, I'll also hang out in the hallway, if people want to come up. ASST: Great. OK, so everybody please thank David Baumgold. (applause) Now, we have a ten minute break
Info
Channel: PyCon 2015
Views: 19,514
Rating: 4.9605913 out of 5
Keywords:
Id: 4EOZvow1mk4
Channel Id: undefined
Length: 29min 16sec (1756 seconds)
Published: Sat Apr 11 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.