Git From the Bits Up

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] if I could ask you guys a few questions before we get started this is gonna be about 55 60 minutes of ideally kind of weird get internals stuff that's what I'm intending to do I hope that's what you came for so just kind of curious as to your level of comfort with get is everybody's totally new to get like complete beginner don't be shy yeah it's totally new totally know anybody else totally new I'm gonna be there you are all right I'm gonna not treat you well and it's not that I want to treat you poorly like like you know but you understand you've been warned this is not how I normally teach people get this is sort of some internals and some extra insights about git that can be fun to have and actually my experience and I have a lot of experienced people teaching yet that's my job I work for github I'm a part of the training team so I go to conferences and give talks on git I visit you know enterprise customers and give two and three day class and one and two and three day classes on git I do this a lot and my experience is when people start to see when developers start to see some of the details under the covers things start to click alright I watched this happen i watch the light bulbs figuratively appear I want to say literally appear but that would not be true figuratively appear over people's heads when some of this stuff starts to happen so I'm gonna do that I'm gonna give you some in some some internals at the beginning and then we're just gonna kind of play around with some other interesting moderately advanced get topics that most people don't feel super comfortable with okay if you've got questions and I appreciate that we are uniformly distributed yet throughout the room I mean we wouldn't want to be all in the front so that we could like relate to each other so well played there if you've got questions basically if you're in the back you have the advantage of being able to sit in the back there's like no way I'm gonna be able to hear you but try I would love to talk it's more fun for this to be interactive but if that doesn't happen if I appear to be in transmit only mode it will have a good time that's that's that's what I can promise you also fun fact this talk ends at 12:25 right my flight out of SJC is at 2:10 now that can be done but if I appear to be in a hurry at 1225 that's not because I don't like you it's because I like my flight so let's start with something really basic I told you this is internals let me show you like the proper and respectable way of creating git repository and adding a file so you for people who came to the advance get talk without any get knowledge this is your lucky three minutes all right if you want to create a new git repository and your normal person you'd do something like this you'd say get in it Jack's and that would make this directory called Jack's which is git repository that has this hidden folder called dot git that has all that stuff in it right maybe you've ever looked at that stuff at some point in your life maybe not it's not there anymore well then you create a file okay you create a file may be like Beowulf dot txt and we'd say so the Spear gains in days gone by had courage and greatness all right and then you would add Beowulf dot txt you'd say okay there it is in my staging area command line completion gone awry there there it is my staging area and you make a commit okay and that'd be cool so that's that's like the basic nobody gets hurt way of of starting a repository so interesting things happen in dot git when I do that in that folder all of a sudden that objects directory has some things in it and if you're looking really closely the the bottom directory called refs used to just have a subdirectory called heads and tags and now there's a file called master in there because I'm on the master branch if you've ever looked at what it means to be a branch and get before you've probably seen that those files are down there if you've ever taken an introductory get class from me you've certainly been pointed to those files so stuff happened and we've got a history and in fact at this point status is very happy it says everything looks great and log also is willing to talk to us about what I just did so those are that's that's the the normal way that somebody would do something like this but I don't really want to I honestly I think that is way too mainstream so what I'm gonna do is blow that away take that git repository away and we're gonna do this piece by piece I'm gonna show you get commands that most of us never have a reason to use at all okay you say well wow Tim that's pedagogical genius show us something we don't need to know but the process we're gonna go through is gonna show you some things about how git is put together and there are frankly software design insights to be gained by looking at get on this level so it's useful to know get it to know get as from the bits up as the title of talk says there are broader insights that'll make us better developers if we internalize them and this is gonna expose you to kind of an internal get API that's there for your use and things can come up situations can come up in the real world where you might need to use this stuff and it's there if you need it so if you want to be that that person on the team who's the get weirdo you know that's I'm like I'm a professional get weirdo so I you know we're I understand you but if you want to be that person you're gonna need to know this stuff so let's jump in all right I am in a working directory that's great I'm gonna make a directory called Jax and that's gonna be my new git repository let's ask let's ask it if that's a git repository it says no no you don't have dot git directory okay so let's make a dot git directory and git is still not happy so it's not enough just to have a dot get directory there's stuff that needs to go in there and the stuff that needs to go in there there there are kind of two basic things we're gonna create and those are the two basic things that a git repository fundamentally is a git repository is a collection of objects and a system for naming those on jackets alright it's a collection of objects and a system for naming those objects the system for naming the objects in get we call those refs so to be a git repository we need objects and we need refs lots of other stuff we might have we might have configuration we might have hooks we might have a description I don't want that I we don't need any of that we need an object a pardon me objects we need refs let's make those things happen so let's see I will say well make derp underneath get a directory to hold the objects a directory to hold the refs and refs are again names for objects in a git repository there is a moderately detailed language that we can use to describe refs and refs with relation to one another and so forth that is the the ref spec syntax probably I'm just guessing kind of classroom time like a full disclosure of all the details of the ref spec syntax for six hours long time ok and nobody would sit through it it's the thing that you pick up a piece of time but the ref spec syntax everyone here has used if you've used yet we need refs those are those names for objects refs are sort us trying to say a generalized namespace the particular kind of refs we need are what get internally calls heads what normal people call branches alright I've got a place for objects I've got a place for refs starting to look something like a git repository status is still absolutely not impressed well it's gonna take a little bit to make status impressed I need to add a file I am NOT going to use VI or any conventional text editor to create that file ok I'm gonna use echo this is this is a commit using echo and commands get commands you've probably never heard of for example in just a minute I'm gonna need to use a command called hash object hash object helpfully computes an object ID and optionally creates a blob from the file so what this is gonna do hash object I'm gonna pipe some stuff into it it's gonna compute a sha-1 hash of that stuff and then it's gonna write that object into the objects database as something called a blob so stick with me here and I'll do it oh wait before I get started on that this pain remember this pain I left over there because my display is being a pain and not letting me in that mirror let's see let's clear let's do a tree of dot get and let's sleep for a second and let's be done and we were in the wrong place for that so alright here we are we will clear we will take the tree of dot get we will sleep for a second and then we will be done every once in a while somebody says hey don't you know about the watch command were you thinking that okay the answer is yes sir yes I do would you say ah it's a it yeah it does some things that I don't quite like anyway we're now looking at the content of the document directly as I change it let's use hash object and see what happens to us so like I said I'm just going to use echo to get that same text out so the Spear Danes in days gone by had courage and greatness we have heard of these princes noble campaigns that is the opening line of the Seamus Heaney translation of Beowulf Swedish on my father's side so it just feels right all right echo that's my text I'm not going to use a text editor instead I'm gonna pipe that into hash object and I'm gonna ask hash object to take its input from standard in and I'm gonna ask hash object to write its results to the object database I don't want to just hash the thing but I want to write it into the object database let's watch what happens when I hit enter and why would use say that I'm in Jax I have sometimes the best laid plans of get ref heads objects don't get ya know that that actually works so the the not working you saw a minute ago was something that didn't happen well that's interesting alright let's solve this problem together we'll look at the help for hash object and that takes input from standard in which I have piped and has - W and that is a little bit of a bummer I don't know if this is a matter of the recent update to get I have installed that there are separate expectations so here's what we're gonna do to save your time I'm going to get up there and I'm going to RM Jack's and I'm gonna get in at Jack's and we're gonna do that in a slightly less hipster way now we'll see that there's a bunch more stuff in there which I will clean out so for example I will take away hooks because they're a little wordy I will take away the description and that'll be enough so let's just leave it at that all right once again get part me echo so the spear Danes in days gone by let's clear this and we see getting is much much happier all right when I did that hash object I'm going to have to read prototype a couple of things here because behavior of hash object has changed since one point 8.1 I am running on one point eight point two that's embarrassing dang it but here's what here's what I want to show you under the objects folder we get a new directory called b0 and under that a file that's named for a hash hash object also gave us the hash of that thing so you kind of see the the the way that that hash is mapped into that directory objects is essentially a big giant key value store of all of the objects that the get repository is composed of but since if you store those as loose objects at a certain point any given file system is going to get mad at you for having too many files in one directory it will not perform well git is just dividing that problem by 256 it's taking the first two characters of that hash and making that the directory name so objects always has 256 directories which all have now potentially a large number of loose files in them now in the real world if you go if you go looking in your dot git directory and you don't see a whole lot of loose objects even though you've got a big project and a long history that's because of that directory they're called pack the pack directory will handle packed object files which is as they say a whole another thing that we're not gonna talk about that's another layer of optimization that's just not important to us in the scope of this talk today all right but we need to talk about the blob that blob contains the content we just echoed into it and I've got a command that I can use to look at that I could say would you please pretty print or print the the content of blob b 0 d 9 c 4 4 that's enough that's enough that that is unique actually I could have gone with just 4 characters get would have taken that but that object holds the content of that file and that's how all files are stored and yet every time you change a file the add command when you add that file to the staging area is creating a new blob with an absolute snapshot of the whole file it's not storing just the changes it's storing everything you change one character if you think about it you're gonna hash that that new stream of bytes you're gonna have a different hash it's a different blob and that blob gets stored how do I know that that object is a blob well number one because I said so and I'm because I can ask I can ask and it will tell me it's a blob and the other question I can ask you is how big it is it's 112 bytes long and that's all the metadata that's in good old b 0 d 9 c 4 4 a 3 1 2 9 etc that's it size blob miss and the bytes themselves beowulf dot txt I haven't said anything about that there is no file yet there's just a blob let's see what status says not much status says not much ok now I need to stage it I'm gonna use a command called update index now index if you've nosed around to get documentation a little bit you'll know that you've have noticed that sometimes the thing that guys like me call the staging area hopefully you call the staging area is referred to as the index that's kind of an old term staging area is the convention now but originally it was called the index and still in the documentation and certainly in the implement it's still called that so update index is going to write the fact a cert a fact into the staging area I don't have a file to stage there but I will say I have something to add and I'll give it dear I one thing I'll need here is my hash let's go back here update index add cache info one zero zero six four four and then the hash which is B zero all that stuff and then I have to tell it the name of the file cuz the blob again doesn't have any file name now when you add a file you make a change again you add it it goes into the staging area it's being prepared to be committed that actually is taking a snapshot of that content and remembering the file name remembering the directory it's in remembering its file mode which in this case I'm giving it 644 so it's read write and read to others and when I do that the directory doesn't change at all but suddenly status says oh well I've got a file stage it's called Beowulf dot txt it's showing up in green it's also showing up in red because why the file doesn't exist so suddenly there's this tracked file called Beowulf but there's no tracked file called Beowulf in the working tree and so get think so no this file doesn't exist you've deleted it even though you've staged it so we're in a little bit of a confused situation right now and we need to move forward I need to create a second object it's not enough just to have a blob but that blob is content that lives in a directory in a file in a directory this is the data model of get at the level of a single commit the file contents are stored in a blob a tree object stores the file name the file mode and represents a directory and right now this project the intent of this project is for be one directory with one file in it so we're kind of getting there now I need to remember this time to copy and paste that hash I'm going to need that but of course I can use my good old friend cat file to ask what type of thing that is and cat file says that's a tree and I could ask how long it is that doesn't really mean anything I could print out the contents of that tree and if you look carefully at that we see okay the first thing is a mode this is like a directory listing the second thing says blob the third thing is the hash of the blob and the fourth thing is the file name this is just a directory listing now if I had subdirectories I would have something there that's a tree and the the content hash would point to another tree object so I can represent a snapshot of the whole project right here all right so I've got the tree written status doesn't care at all what I need now is to create a commit these are the three fundamental objects that I get repositories composed of I said a git repository is composed of this object database this big giant key value store plus refs plus names for the objects okay the three kinds of objects you need to know are blob tree and commit blob is file tree is directory commit owns that commits says this is a snapshot of this project at this point in time written by this person at this email address at this time in this time zone with this message descended from this parent that's kind of all the metadata of of a commit as soon as I hit enter here we're gonna see the third object popping up get is good enough to give me that now if you use git you know that sometimes you see these hashes like we see right down at the bottom there are three d09 those are all very familiar-looking almost all the time you don't care about the tree hashes and the blob hashes you're looking at commit hashes but that commit owns a tree and that tree owns other trees and blobs that's always the case when we do login just a minute three d09 is what we're gonna see all right and let's see what status has to say statuses is happier here let me clear that screen that is still not entirely happy I wonder why it still doesn't really think the commit has taken place because we're not quite done with our refs now there are there's there's one Ref if you use git that you've used you probably think of it as a ref but it's the word head head refers to kind of a commit that you're on it's like the most recent commit the last one it's gonna be the parent of the next commit you create all those things are true about the head and we do have a head if you look right down on the right dot git there's a head let's look what's in there that's what's called a symbolic ref now the head points to a commit and right now status is confused because it doesn't really know what committed pointing to a commit hasn't really officially happened yet head is just saying I don't actually know the hash of the commit I'm on but we're on the master branch that's called a symbolic ref and that's a good way for your head to be if your head actually contains a hash then that means you are in a rather uncomfortable state called the detached head state and that's not as bad for get as it is for you generally you can only have a detached head once you smell really going back with get that's it is an okay thing to do but what that means is you shouldn't make changes when you're in detached head state that's sort of a read-only way of looking at the repository this shows us get thinks we're on the master branch but there isn't anything there let me zoom in kind of on the refs folder see right there there isn't anything called refs heads master we need to put something there now do I have yes I do I happen to have that commit hash let's just check that out get a cap file print that out for me now that's wrapping a little bit but if you can see what happened after I did cat file it lists a tree an author a committer and then it says initial commit that's actually what a commit is and the log command is always taking that data and interpreting it in some way when log is is representing commits to you it might be reformatting that data in a more friendly way that thing right there that object is a commit and we need to tell I need to put that in dot get refs heads master because part of committing when you're on a branch and if you're not on a branch if you're in detach head state and you commit you're doing it wrong so part of committing when you're on a branch like a good citizen is moving the the label of the branch forward to the new commit now there hasn't been a previous commit so we're creating that branch label for the first time but I need to do that I just spat that text that hash into refs heads master and that fundamentally at the lowest kind of bit level that's what a branch is a branch is 41 characters forty characters are sha-1 plus a newline in a file under dot get refs heads and the name of the file is the name of the branch so right now status is almost completely happy there's nothing in green anymore status thinks the commit has taken place in fact log now agrees that the take commit has taken place so we're almost all the way there by the way that let's see that cat file right there looks familiar the raw log format is basically doing the same thing as cat file and just giving you a nice yellow commit header on top of it alright my problem now is that I don't have the file it's gone one last step normally you use checkout to switch between branches you say git checkout feature branch get checkout master and that moves the head over to that branch and if you looked in the dot get head file you'd see a change it's also going to rewrite the working tree so that all of your files in the directories look like the trees and blobs in the commit that you just checked out okay so it changes the working tree and changes your head and as it were puts you on that branch so that's that's kind of core checkout functionality well I'm checking out the head which is sort of redundant right that's like saying switch me to where I am right now but I'm giving it that - - now that - - with no switch no parameter after I just kind of a repeated idiom in git you'll see that from time to time that separates the rest of the command it separates you know what follows from any of the switches and parameters that came before and says however it is you were normally going to do your thing do it but just do it to these files so like if you if you log in that way you know log - - Beowulf dot txt instead of showing the whole history log interprets that as saying well I'm only gonna show you commits that have this file in them or say you have a diff where 20 files are different but you just want to see the dip of one you could say diff - - that filename it'll say okay I'll just show you that file so every command is different in the way they interpret this check out says I'm not gonna try and switch branches or anything like that I'm just going to reach into the reference reference to commit get that file pull it out and dump it into your working tree so if you knew there's some commit over in a branch and you don't want to merge the branch and you don't even really want to cherry-pick the commit it's too messy but there's this one file that you need you can use check out like this to reach into any commit grab the file and pull it out throw it in your working tree and with that we have Beowulf and Status is happy we have finally over the space of like 20 minutes done a single commit so why are these things here and and why do this exercise number one I think it's really helpful for you to know that at each commit is a tree you know in the data structure sense composed of these tree and blob objects representing a snapshot of all the files underneath that commit there are a lot of implications to that architecture that we could talk about if it was just us around a table interesting things go on as a result of that but I want you to know that I want you to know that that's how that's how good has put together for me when I was learning it that was a big deal just light bulbs went on all right so that's one thing number two hash object right tree commit tree what is this business well these are called plumbing commands the commands you normally used like in it add status diff log branch checkout rebase reset push pull fetch those are all called porcelain commands these are gets terms is I did not decide on the bathroom and analogy but you know the user interface in a bathroom is composed of typically porcelain maybe stainless steel or marble if you're into that but let's just go with porcelain that's the part that you like touch and use now that is supported by and composed of plumbing on the backend usually you just deal with the porcelain most people just deal with the porcelain if you are a professional like you're a plumber well then you do love plumbing all the time if you are doing an addition on your house or remodel in your bathroom maybe you'll deal with the plumbing if something breaks maybe you'll have to know how the plumbing is put together so for most people the plumbing commands and get and that's I would say you know kind of my taxonomy of plumbing and porcelain 80% of the of gets hundred and forty five commands are plumbing most people don't need to know them but it's good to know that they're there in case you need them because you can get below the level of the high level of abstraction very functional add log status types of things and do you know very fine-grained operations on things and these are all documented in fact the help the man pages for things like right tree and hash object look at that it's done I'm gonna hit space seriously twice if I get help for a porcelain command because because Plumbing's hard right it's for experts porcelain is easy so let's get the help for I don't know log it goes on okay so the and it's got pictures in it look at that the the porcelain commands are much more functional much more complex do more but you ought to know that the plumbing commands are there because and here's here's the software design lesson it's a good idea when you're creating an API by the way that is how I described it as an API for doing version control things it doesn't have a workflow it doesn't have opinions about how you should branch and you know how long branches should live or anything like that that's up to you it's not gonna make those decisions for you it's just this toolkit this API for building your own collaborative workflows now when you're building an API just think at the level of code alright you pick some abstraction of the world that you're gonna expose you and the abstraction is is a lie to some degree right it's a simplification of what the world really is but you have to do that you pick some some convenient simplification of the world you represent that in code and you publish it you document it it works here you go here's the API or relation to the world people use the API and they start to say well here's an edge case you hadn't considered I need the API to do this thing and it doesn't and what do you do well if you're if you're an undisciplined API developer like me and you like to make people happy then you say well of course let me add that you'll add an overload to a method you'll you'll be fum data structure that the API passes back and forth and and things get a little more complicated that happens again and again and again and after a while the API is very brittle very hard to learn very complex and you've got these you know 15 parameter methods that you thought were a good idea at the time and complex classes that it's moving around it's just hard to understand bad plan alright a better plan is to keep some discipline about that top-level API and you know this might not look like discipline to you but trust me it is keep some discipline about what work that top level is willing to do but implement it out of lower level API entirely you know make sure the plumbing is composed entirely of the porcelain and document the porcelain it's a good way for us to build systems it's how good is built all right well that's it for the for the the internals we kind of can do anything we want at this point this view of the tree or the gate directory is actually gonna get really annoying at this point so I'm gonna use a script I have called log live that will just give a automatically refreshing one-line log so we can sort of see some things that are going on it'll wrap a little bit but it's okay now if I could just look trying to get outside of the light so I can see you who rebase is with with confidence there's like two three hands it's not many hands good for you the rest of you let's learn now rebasing is often confused with merging and it's really a very different thing so let me let me just kind of give you an idea of what's going on here I've got my one file Beowulf dot txt and and and shamefully that's actually all I know of that so we can't go any farther there so we'll do something else we need to go with Edgar Allan Poe that's a good choice all right so I need a little bit of content in my master branch I'm gonna have some content of my master branch I'm gonna have some content in my my feature branch of course I don't have a feature branch yet that's okay we'll come back to that in a minute let's just do this and I'll say here I'll say ah distinctly I remember it was in the Bleak December and I'll save and quit and I'll add that file and I'll commit with Bleak and there we go the log updates automatically now I need to have a branch now don't I I need a feature branch to be able to show you how rebasing works I need to quickly remind you of what merging is and I'm sure many of you already branch and merge work with me on this of course what I want is a branch prior to master I don't want a branch from where I am right now and that's okay I don't have to always branch from the head I can branch from some other place for example three d09 I could do that and that will put that branch right there another cool trick you may not know head carrot that means one commit prior to the head the parent of head and head only has one parents that's unambiguous if I wanted to go back like five obviously in this repo that's not going to work I could say head till day five and that would work I'm gonna say head carrot and that's gonna throw that feature branch right back there no it's not we're gonna we're gonna reef or stat to go here for reasons I will be happy to explain later and Raven is gonna have to get another commit I was about to do something bad to myself and you didn't see it happen so that's good I was in the Bleak December and each separate dying ember wrought its ghost upon the floor mber now master pulls ahead a feature I can go check out feature and I can go in here and add a title feature branch now conventionally I would want to be back on the master branch which I am now you can you can tell just by looking at the log head is on master heads not feature on feature so I'm on the master branch to merge I would just do I would do that merge in the feature branch and that creates a merge commit and that's fine as long as that's what you want your repo to look like rebasing says you know what I don't necessarily want to have to create a merge commit in fact you know how I was kind of I was kind of thinking through you saw me think wait where should my branch be from should i branch now should i branch later and i i'm kind of at this point I'm saying I don't like that thought process I think I branched at the wrong time or in any event I wish rather than branching two days ago and doing all this work while all these extra commits happened on master I wish I could just branch from master right now and have get automatically rewrite all my commits on top of master that's what's about to happen that's what a rebase is this is a merge right here and I'm still gonna need to merge but I want to rebase first and rebasing is is a great way to handle this now I don't want to have to type out more stuff so I'm going to show you a way that I can undo that merge with a command called reset hard now reset comes in three different flavors soft and hard and famously mixed a hard reset he's gonna change where the head is so the head I'm on the master branch it's at a b09 edie nine right now I want it to be back where master was before I did the merge and you don't have to shout it out but if you could look at the screen and get an idea in your head make a prediction about which commit you think that is where was master before the merge there's there's basically only one init one possible answer and that is I don't know what your answer was but I'm gonna tell you the right one is b7 FD a to D member the commit called ember that's what master was for the merge and so if I want the merge to have to go away I will simply say let's reset to be 7f a which is that's an F D thank you guys FDA FDA it's government agency all right and now it's like the merge never happened so what happened to that commit I wonder if I look in my objects directory if I still and I didn't note what the merge commit was I should have I wonder if I look down in the objects directory if I could still see that prior commit there and the associated tree and blob objects with that commit the answer is yes it's actually still lying around and so reset can be a very dangerous command all right number one it rewrote my working tree so if I had uncommitted changes in tracked files those just got blown away permanently all right there's there's no way to recover those uncommitted changes after a reset hard so that's a little dangerous but it could also be dangerous in that maybe you reset to the wrong place or maybe you were on the wrong branch when you did the reset I've done that in front of 400 people before and you have to have some way of rethinking the problem and fixing it now I wanted to do this reset but let's let's say if there's a possible world in which I'm upset about that I want to get that lost commit back so I look at this thing called the ref log super super super strong sauce here if you don't know about the ref log you're gonna be glad that you did the ref log isn't a list of commits all right that thing over there that I've got in the pane on the right is not just a list of commits it's actually the graph of commits rendered and technically that's what the data structure is that the the commits form together look that the git repository is a graph ref log isn't trying to spit those out it's giving me a list of the changes that I've made that have caused commits to change in some way so if you bring a commit into existence with commits that gets ref logged if you backup over a commit with reset that gets ref log if you merge a branch if you check out if you rebase anything that's gonna cause refs to change gets ref logged and this is so powerful because I left a b07 II D 9 that that used to be my my merge commit no I can go back to it if I want now my birds are back now I don't want that ok what I want to do is on do so I'll do this this usually means go to the most recent thing and so I will and now I will show you a rebase I don't want to merge I want to rebase first now as I said let's suppose that my frustration is I don't want to have bird branch from bleak I want to have branched from Ember I don't want my branch to look like a branch I want to look like I just instantaneously did all this work right now and and that's what rebasing is gonna do it's gonna pull up in this case we only have one committed each I could have 100 in each n it would work the same it's gonna pull up all the commits in my future branch and rewrite them on top of master so that master is the parent and in the process the hash is gonna change it's not gonna be 0 0 1 6 a to see anymore it's gonna be something new and if you think about the fact that the commit the way you get the commit hash is by hashing the raw contents of that commit object well the contents are going to change the parent is going to change the tree is going to change it's a different commit so rebasing rewrites commits now the syntax here is very simple I want to pick up my branch a mover rope move it over to master and so it is it's done so there doesn't look like there's a branch anymore and if I get on it right now and nobody else commits to master I can go merge feature into master and I'll keep this nice straight line history really cool so this is actually a controversial topic check out master merge feature this is going to be what's called a fast-forward merge so if you'll count I have 4 it's right now and the head is too something but I can't quite see and here let's do this for me merge feature the head is to 0 EB 7 7f master is just gonna move up to point to that now that's what's called a fast-forward merge where we just have to move so there's nothing in the branch that isn't already in master I'm on master I'm merging the branch into me get nose it can just move master up to the head of the branch that emerging guaranteed to work commits pardon me merge conflicts are impossible no new commit is necessary it's a very clean kind of merge and that's what rebasing buys you the broader story of rebasing z' there are four hands I saw when I asked before people who do that right now the broader story is complex and borderline political interesting tribes form in the get community around whether this is a good idea or a bad idea and you can find that discussion online or if you if you if I ever see you at a longer talk or a workshop we can have that discussion and really think through the implications but this is certainly a good thing in that it keeps history straight it does have some consequences about what sort of collaboration you can do on branches when you can rebase whether you have pushed commits before you know if I had shared the commit that existed before the zero zero something and then I rebase it I rewrite it you know that can be difficult for other people to deal with if I had already pushed that up to github so it's a big topic and would leave it at that so there we are now I've got a few more minutes done at 12:25 right we probably do an interactive rebase because I like interactive rebase --is and they seem similar to rebase --is so I'm gonna keep my feature branch around that's a little bit sloppy but let's just do that I am now on the feature branch excellent excellent now let's suppose I'm one of those people who likes to commit a lot now in the real world I'm not I I'm one of those people who probably needs to commit a little bit more frequently I don't mind admitting but what I need to do is I need to generate some random changes and I'm gonna make ten files called random one two three four five text and commit them one at a time so I'm this kind of person who likes to just commit every five minutes like you were hitting command s in in pages or something like that I just like to save a lot now that's cool and honestly if you are that sort of person I don't think you should have any reason to want to stop just do it if that's comfortable for you but sharing commits like that is a little bit antisocial because each one of those commits let's just look at the log each one of those commits you could see just has a single file that most recent one adds random 10 and then random 9 and then random 8 and random 7 and random 6 in it it kind of looks like what you were trying to do was add 10 files and you just happen to split that work up across 10 commits and when you get into advanced git commands like revert and like cherry-pick when we try to make sense of how those commands actually work in in the world it turns out that you want commits that are cohesive and you want commits that are decoupled so you want all of the things in a commit to have to do with one another like you're telling one story alright you don't want to tell four stories in one commit like you're you're working on some feature and you happen to notice some misspellings in the Javadoc of a class totally unrelated to the feature and so you have this commit that's moving your feature along oh and by the way here's a change to some other file with some misspellings of the Javadoc okay you're really you're talking about two different things at once if you did that in real life you know somebody said how's your weekend and you said well it was great we saw Iron Man on Sunday at this dinner theater kind of new thing in Aurora and the Mayans were an ancient civilization that stepped out for a bucket of chicken one day and never returned and after dinner we came home and had a bottle of wine people be like whoa what yeah those are totally unrelated things so don't do that it commits either alright cherry-pick and revert will break and those are very powerful commands if your commits are not cohesive likewise your commits should not be terribly coupled now there is always going to be coupling between time ordered snapshots of the development work that you do that's unavoidable but that's dumb right here I've got 10 things that are 10 things that really should be one so if you find yourself making commits like this it's okay you just need to fix them before you share them and you do that with this rebase interactive command so rebase interactive you don't really give it a branch you say how far back in history do you want to go and in my case I'm saying ten commits ago is where I want to go using that that syntax that says ten commits before the head and what that's going to do is gonna it's going to it's going to reconsider the head inclusive up to that commit exclusive alright so if you actually did the math head Tildy ten is title to 0 eb 7 7 f2 0 eb 7 7f is not going to appear in our list so when I hit enter now I'm going to get VI confusing you so if you don't mind that wrapping is unacceptable there okay the the default get editor pops up so this is sensitive to the editor environment variable or a git configuration parameter called core dot editor in my case I have those on set so I get the correct answer which is VI of him this is a recipe that interactive rebase wants to execute to rewrite those commits it's listing them for me in the reverse order that log normally does alright so if you paying attention you'll see random one that's the oldest one that's at the top so oldest to newest and it has an instruction in that first column that instruction is pick so the recipe I get will pick all 10 commits in the same order and not change them at all so rewrite them all to be exactly the same thing that they are in the same order it's kind of a no op but we've got at the bottom that little recipe recipe book where it says under commands I can pick reword at it squash fix-up or I can run a script I'm gonna make this easy I'm not gonna do a hideously complex rebase basically I want to squash all those things into one but I could for example say when you get to random five if I say if I made the command edit at random five it would stop the rebase and drop me back out on a command prompt and I can do anything I wanted to that commit it's completely mine to change so using interactive rebase you have absolute power to change history as long as you haven't pushed so let me let me make some changes here I'm going to squash a bunch of commits you just have to use the first letter and that is enough squash squash you know what I'm gonna delete this line I don't like the number six alright squash and squash alright so you can't squash the top one of course because you got a squash into something but that's gonna take all of those now eight newer commits and compress them into that one older one it's gonna get rid of random six like it never happened when I save it flashes a little progress message there and brings me to the commit editor it's basically glommed all those commit messages together I don't really want them so I'll just delete them and I'll say random files one through five and seven through ten save the file notice my vastly simpler history feature is now just a single commit that's a nice commit to share what if I screwed up what if I did the wrong interactive rebase where are those commits I don't have them anymore well I do have the ref log and I encourage you even before probably just about done with that even before you you need the ref log just start looking at it you know before you actually need to run it after you do something like a commit or reset or a pole or if you do a rebase interactive make it a habit to study the ref log just check it out because it takes a little bit of learning to know exactly where to go rebates for example makes a lot of noise the most recent thing says finish and then squash squash squash squash squash and then check out and you're like I don't remember doing a check out what's up with that well you do enough of this and you'll see an interactive rebase always starts with a check out so just keep an eye on it you know watch it and you'll learn in my case if I if I wanted to undo this I would reset back to head at 10 or 7 3 da 0 4 to reset hard to there and now all my horrible work is back but that's not what I want I want to use the single step undo which is there now it's back so you got a lot of power to reshape what's in history after you push it it's harder all right if I had pushed these commits and then I rebase them or I reset past the head and I try to push again github is gonna say or whatever upstream repository you're pushing to is gonna say well wait a second the way this is supposed to work is you have a commit and then you have another commit that's that's a descendant now you have another commit that's a descendant of that and as long as you give me new commits that are descendants of the commit I already have well then we're safe I know you're not losing anything but what if you give a commit where I can't go reach my head I can't go back traversing parent links and get to my current head that means you're gonna cut these commits off no in this case it's fine right I know I have all the work I want I want to push that stuff on top of of whatever is upstream but it's gonna require that you force that take this extra violent step and you know beat the thing down give it a force push and everybody else who pulls on that branch their next pull will also require a force pull exposing them to the risk of losing code so really you don't want to do this to stuff that you've already pushed already shared unless you really have to if there's a good reason then do it it becomes a synchronous activity everybody on the team kind of has to get together and think about it at the same time and make sure that they're not losing work so that's that's kind of big damage control mode whereas this is just I had this stuff locally and I'm sloppy and I want to make my work neat before I share with the people that I work with because I love them and I want them to have a good time using my work so with that that's basically all we have time for us we've seen we've seen how to make a commit without using add or commit we've seen rebasing we've seen reset and ref log we see an interactive rebasing those are things that you know if you can really reason about how rebase works and you can get around the ref log and reset and things like that you are way ahead of most get users and if you've got a little bit of insight into how those internals work I think you're gonna be I have what I hope is you'll be unlocked mentally to be able to think about this thing like a machine as soon as you know some of those machine parts inside it takes the mystery away and you're willing the reason about it and I really want you to be it's not that hard to learn with just a few of these tips like this so thank you very much for being here I really hope you enjoy the rest of the show Thanks [Applause] you
Info
Channel: InfoQ
Views: 84,211
Rating: 4.9160371 out of 5
Keywords: tim berglund, github, advanced git, git internals, marakana, techtv, jax, Git (Revision Control Software)
Id: MYP56QJpDr4
Channel Id: undefined
Length: 55min 45sec (3345 seconds)
Published: Mon Jun 17 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.