Implementing (parts of) git from scratch in Rust

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello folks, welcome back to another Rust stream. This is going to be another one where we implement something from scratch. And in particular, it's going to be one of those streams where we go through basically a sort of guided set of exercises or challenges. We've done a couple of these. So we did one following the fly.io distributed systems challenge. And we did one where we did the code crafters implement your own BitTorrent one. Both of those were really popular and I've been talking to a couple of people and it seems like one of the reasons is because people can try the thing on their own at their own pace and then sort of switch over to the video and view me as a sort of reference as you go and compare their solution to mine at each step of the challenge. And so, you know, if that is a... A way that people enjoy going through the learning and I totally understand why, then I should do more of them. And so here I am, I'm going to do another one. I also think they're really fun. So today we're going to build Git from scratch. We are going to follow a CodeCrafters thing here as well. Apologies for the saw outside my window. Hopefully it's not too bad. So we're going to build Git from scratch following the CodeCrafters Git challenge. I've been told that the last step of this challenge, the clone a repository, is the hardest single step in any of the CodeCrafter challenges. So we'll see whether we get to it. It depends how long we actually get through. But we'll see how it all pans out. So CodeCrafters is not free. It is a paid site. But there is two ways that you can get access to this. The first of them is I have a referral link you can use. I'll put that in the video description. I also put it in chat here. If you go through that link, then you get seven days free or something. They also have the entire challenge on GitHub. I'll put that in the video description as well and in chat, that has all of the steps of the challenge and the text of the challenges.. It doesn't have all the nice infrastructure that is on the site for like running your test suite and stuff, but at least it's there. And I think all the tests are there too. So if you don't want to pay for it, then you can do it that way instead. And as a sort of, I guess, disclaimer at the beginning here, I'm not sponsored to do the stream. Like no one has paid me to make this video. I do have the referral link. So if people like it and pay for it, then I get money from it. But it's not a paid thing. Before we get started, there's just a very short amount of housekeeping. And I know people hate housekeeping because it delays the start of the stream, but we'll do it anyway. The first one is that there is now, if you haven't seen it already, there is a Discord for I guess me, specifically for Jay here. Jay has a Discord server that I happen to be on. The Discord server mainly has announcements of... whenever I do new videos, when I'm planning to do new videos, and other things that might be relevant, and sort of live notifications and such. And you can find that at discord.johnhu.eu. That'll redirect you to the invite link. And I guess I can also put the invite link right here in chat. so that you have a handy way to get to it quickly. There are a bunch of other channels too that are only available to people who sponsor me. So that brings me to the second thing, or the third, I guess, which is I now have a GitHub sponsors. There is no requirement to sponsor me. You don't get anything particularly fancy except access to like a couple of these Discord channels. The main thing is if you find that you've gotten something valuable out of the streams that I've done and the other content that I've produced over the years, then I would- I'd appreciate if you could sponsor me and sort of help me do more of this, but I'm also in a stable job, so this is not a thing that sort of sustains my living. It is more of a way to sort of go the extra mile. Okay, and then the last bit is I will be at Rust Nation at the end of March. Helsing, who I now work for, is sponsoring a hackathon at Rust Nation. So I'm going to be there running a hackathon with like remote control drones and cars. I think it's going to be pretty cool. If you happen to be there, stop by and say hi. And that, I think, is all the housekeeping I wanted to do. And so now we can get started. I have not looked at this challenge at all before starting. So this is sort of my normal style of how to do these videos, right? Is that I don't want to go in already knowing what I'm going to build. Because if I already know what I'm going to build, it's going to be much less educational for you. And it's also fun to watch me get stuck. So, we're going to start this challenge sort of blind, and hopefully this will go well. We'll see how far we get through the challenges. My guess is, based on the difficulties here, my guess is we'll get to the last one. We might not get through the last one. Sort of depends on how long I'm willing to stream today as well. I'm guessing we'll stream for... 4-5 hours, but we'll see. Okay, are there any questions before we start? Before I click the start building button here. This is very authentic, good. Just like at home. The Discord is for anyone, like you don't need to be a sponsor to join the Discord. It's just that the only channels that are on the Discord, if you're not a sponsor, are sort of the announcement channels. Like there's no actual chat channel. And then depending on the tier you sponsor at, you get access to sort of a community chat channel. One where I post interesting tidbits that I come across. And then this sort of goes a little bit up from there. So there's a more frequent Q&A tier. And there's also a tier where you can sort of suggest additional streams I should do and such that are further up. Will you implement rebase? Probably not. I think, you know, rebase to me would come after clone a repository here. I think it's worth pointing out that, you know, I've worked a lot with Git. So it's not like I don't know what Git is. And in fact, I know a decent amount about the data model for Git too, because sometimes that's how you have to debug why Git doesn't do what you want. I just haven't actually looked at this challenge before. ...prerequisite for this. So, this is going to assume that you know Rust. Like, I'm not going to teach you Rust in this stream. But if you don't know Rust, and, like, hopefully you should be able to follow along. It might just be some parts of it where you're like, oh, I don't know what that piece of code does or the syntax does, but you should still generally be able to follow along. But the goal of this is someone who generally knows Rust and then wants to sort of... see someone with experience building this and hopefully over the course of that you'll get exposed to some techniques some libraries some ways to use the standard library and just techniques for programming in rust more broadly and potentially also debugging techniques so i'm hoping sort of teach some intermediate rust concepts as we go through this as well If you want an intro to Rust, one thing that I can recommend is, so a couple of years ago, I ran this class or like a mini class at MIT with two other lab mates of mine called the missing semester of your CS education. It's missing.ccl.mit.edu. I'll put the link in chat. Nope, that's not the link. This is the link. And it has a lecture on Git. It's not given by me, it's given by Anish, who's also a great lecturer. And if you go in there, it has like both the a bunch of written notes throughout this. And it also has the entire lecture video, at least in theory, if I can get that to load. And so this one is a, I think a really good walkthrough of sort of the mental model of Git, but also the data storage model of Git. We'll get into a lot of the details of this as we go through the stream, but if you're completely new to Git, I would recommend checking that out. All right, so let's then get started. I don't think there are any other burning questions. So let's go for it. Start building. I would like to do it in Rust please. Language proficiency. I'm gonna go with advanced. Next question. How often do you intend to practice? Once a month. Accountability? I'll pass. I don't want accountability for this. Alright. Step one, clone the repository. That's fine, we can do that. Git clone. Okay, and then they want an empty commit. I can do that. Right, so one of the things that CodeCrafters has set up, and I remember this from when we did the BitTorrent challenge, is that there's sort of a push hook. So whenever you do a git push, it runs all of the test suite over your things. You see here, it built the Rust app, it ran the thing, and it's telling me that basically test one failed. Which is totally fine, that's sort of expected. I guess we can go here. Woohoo! Okay, great. They received the git push, so that means we're now set up. Um. Okay, that's fine. Your next stage, implement the git init command. Okay, so the idea here is that we will have, they'll run our binary basically as the git binary. And the git binary has an init command. It initializes by creating a git directory with some files and directories inside of it. Okay. Ooh, yeah. What's in the git directory? Bunch of text files. Yeah, that's fine. I'm guessing this is the same thing that's described below. Alright, so we gotta look at what we have in here. Let me get another couple of terminals in here. Let's see what we have. Source main. All right, I like clap, so we're going to go ahead and add in clap here. And then I can never remember the setup for this. It is like this. Oops. Great. like so. And the arguments we want here is actually we want subcommands, right? So in clap derive reference. Subcommand. There's the example for subcommands. Yeah, here. Okay, so what I actually want here is I want subcommands, right? For things like init. And command here is one of these guys. And I don't actually have any global arguments as of yet. I do want to include here subcommand, which I want to derive. And actually let's use the tutorial instead, which has subcommand without all the bits. So I don't need this, don't need this. And get rid of this init. And initially, init is going to take no arguments. We do want to derive debug for it. And then down here, instead of doing the sort of env args that they're using here, what we will instead do is match on args.command. And assuming that is, you know, the only valid command initially is in it. And then we're going to have, you know, this. And then we don't actually need an else branch here that they have for unknown command because clap will take care of just crashing for us and giving a useful error message if the appropriate, if a sub command was given that's not the thing we support. And then, okay, what do we do here? We create a directory.git. We create a directory.gitobjects.refs. We write to head refs head main. And we print initialize git directory. All right. Just to see what that does. That's fine. That's all fine. First exercise. Get push. Let's go look at what it says here. So we get an explanation as well. Okay, yeah, so we have this directory structure, we have a head file. The objects directory contains git objects, the references contain git references, and head contains a reference to the currently checked out branch. So in this case, it says that the main branch is checked out. Okay, so let me give you a brief overview of git here, rather than sort of diving into... So this is linking into the git book, which is a good read. But just in the interest of time here, rather than try to read through it, I'll give you a very rough sort of headline overview here. So... In Git, everything is stored as a, roughly speaking, a content-retressed blob. So that means if you have a file, for example, the file is really stored as all of its bytes and then keyed by the hash of those bytes. And so this is an object. Similarly, a commit is also an object. So a commit is an object that has a... Well, actually, there are a bunch of things in between here. So a file is sort of a blob is the lowest level thing that is an object whose key is the hash of the file of the blob's contents and whose contents are the contents of that file. A tree object is something that holds something like a directory. And what it really holds is a list of name and object key for the contents of that file at that time. So you could imagine that the tree for, well, I don't really want to use the same thing, but let's say I do here a make their foo and I touch foo or I echo hello into foo bar. Then the way this is, this will actually be stored is there'll be a. This sort of thing will be stored somewhere. And that is the tree entry that we would have for... for foo slash bar. So for the repository here that's rooted at foo, we would store the hash of the contents of foobar and the name foobar in a tree object. And we would also separately in the object store, store this hash and the contents of foobar like this. And then where would this... you know, this actual string right here, where would that be stored? Well, that would be stored in a tree object. And the tree object is keyed by the hash of this literal string. And where is that hash object? Where is that hash stored? That hash, that object key is stored in the commit object. So when you create a commit, the commit points to a tree, and particularly points to the object key, the hash of the root tree object. as well as it includes information about the metadata, like who authored it and such. And then the commit is stored as all of the bytes that make that up and keyed by the hash of that content. And that hash is what we know as a commit hash. So if I do rev parse head, so this hash right here is the hash of the commit object that head points to. And similarly, you know, if I do master here, for example, it's the same hash because master and head currently point to the same thing. If I did something like head hat, which means the previous commit, this is the hash of the commit object before head. And I can even do cat. No. What is the name of that command? Maybe it is just cat file. Yeah, so cat file lets you print out what is contained within an object. So for example here, we can do cat file commit head. And this, sorry about the chainsaw, this is the entirety of the contents of the commit at head. And if I did a rev parse of head, and then I did a cat file of this commit, it prints the same thing. Okay. Right, because ultimately the commit is keyed by its hash, just like everything in Git is. So the core of Git, in a sense, is this object store. And the object store just stores maps from hashes to the body of the thing that contains that hash. So whether it's a commit or a tree or a blob, whatever it is, it's all stored in the same object store that has the same rules, which is it's a content hashed storage. And so you see, when we look at the cat file here of this commit... You see the contents of it is the tree for this commit. So basically the contents of all the files recursively all the way down is this hash. The parent commit has this hash and the information about the author, the committer and the commit message. And then we can keep walking down, right? So we can do up here. we can cat file tree this tree. And this might not immediately look very reasonable, but this is basically a serialized representation of the tree. And in particular, the things you'll see here is you'll notice that there's the mode of the file. So this is like, is this readable? Is it writable? The name of the file. And then these bits in between are basically a binary representation of, you guessed it, the... object key for the contents of those files in this particular tree, in this particular commit. And one of the reasons why you have this structure is because, for instance, if you create a new commit where you change nothing, right? So we did a commit earlier that was like allow empty. Then it can use the same tree. So you don't need to store the entire tree twice once for each commit. You can have two commit objects that have the same tree reference embedded in them. And similarly, if you have two trees that differ only in like the contents of one file, then the tree objects are going to be different, but they're going to reuse the same object key for the blobs of most of the files, except for the one file that's changed. Right? And so that's sort of the setup here. And you'll see here, like you see here is source, and then source just points at its own tree, right? So if I, that's not really a nice way to turn this back into the hash that we need. But there is then a tree object for the sort of hash. hash equivalent of this binary string. And if we cat file that, we would see a similar tree object that would list all of the things that are under source, which is main. Okay. So that's what we're creating here. So we're creating objects, which is the sort of root store for this content hashable or content addressable store. And then refs is really just a mapping between human readable names, like the name of a branch, for example, and which commit hash that reference is to. So if we, in fact, look inside the git that we have for this. And we look inside of refs, you can see here that the refs are in heads, you can see master. And if we cat master, you'll see that that is the same git rev parse, the same hash as we get back on head. when we parse out the commit hash. So that's just where those are stored. If you had multiple branches, there would be different heads under here. At the moment, we only have the one. But under refs, we also have remotes. So remotes is the same kind of refs except for references that are stored remotely, such as branch pointers on GitHub, for instance, from the last time you did a git pull. So if I do a cat of origin head here, for example, and I'm going to do a cat of origin head here, You see that it doesn't have a commit hash, instead it says head is the same as, that's what this ref here means, the same as ref's remote's origin master. So that means head is pointing to master. And if we cat master, you see that it's the same thing as here because we did a push. If I do a git commit allow empty like this, test2, then now if I cat... If I rev parse head, you see this commit hash is now different. Similarly, if I now cat master, that's the same because that's the local one. But if I cat the master in the remote origin, I still get the old commit because that's where that points. And so now you're sort of starting to see how this comes together. When you do a commit, you update your local refs. When you push, what you really do is you're telling the remote, hey, update your refs master to have this commit instead. This commit hash, rather. The other things that are in refs is tags, which obviously every tag has some name and has to point to the hash of the commit object that that tag is pointing to. And if we look at objects, you see this is almost entirely just a list of files by hash. And in fact, we could even find here, so head was that. So get objects 6f. You'll see here that. The first two characters are used as a directory name and then the rest are flat files. The reason for that is just because Linux doesn't like it if you have a bajillion files all in the same directory. So splitting by the first two characters of the hex just allows the file system representation of this to be a little bit more efficient. But you see here 6fc85, 6fc85. So this is the same object. And indeed, if we now cat this. It's a binary file, but really the contents of that is commit is the same as this. It's just a compressed version of it. Okay. If you hex dump that, we see the same hash for tree. So you want to hex dump this file. I don't think that's going to help you too much. No, this is not just like the binary representation of that because it's ASCII, it would be the same. It's like, I believe these are compressed. I forget exactly. I'm guessing we're going to get to that in one of the first exercises. Okay, so that hopefully now you have a sense of what are in each of these two directories. So there's objects, which is the object store, refs, which we just talked about, and head, which is just a special kind of ref. It is specifically which commit is currently checked out. That's what head means. And initially that's just going to be the main branch. That's fine. So we did this first thing where we now, you know, we create, we just create the correct directories, and then we also create the head file, right? So if you look at main here, we created git, we created git objects, git refs, and we created this git head file that points to main. That's what we did. Git actually creates more files and directories, right? So if you go look in Git, there's a bunch of other things here like logs and branches and stuff. We're just going to ignore those for now. These are all we need to get started. There's zlib compressed. That sounds about right. Git has a new line at the end. That's fine. Okay, view next stage. Read a blob object. So this is the cat file command that we actually just used. And so I'm guessing here the goal is going to be that you can cat file some blob and it just prints what the contents of this is. So I'm guessing here we're going to get to the point of doing a decompress, right? Which is exactly what we expect. We'll deal with three Git objects, blobs, trees, and commits. Right, so we've talked about all these so far. Blobs are sort of the lowest level thing, which is just the contents of a file. Yep, only store the contents of the file, not names or permissions. And this is so that, for example, if you have two copies of the exact same file in your directory tree, it's only stored once because the tree just stores the same hash twice under different names. Yes, it's a SHA-1 hash known as the object hash. This is an example. Git object storage. They're storing Git objects. We looked at that. We looked at how the path would be structured. Each Git object has its own format for storage, right? So blobs, trees, and commits have a different on-disk binary representation from each other. And this is presumably because you want to optimize for, you know, both how they're used and how they compress the best. I'm guessing we'll see that in a second. So for blob storage, they are Zlib compressed. Yeah, so whoever was in chat was totally on board with this. A format of a blob object looks like this. Blob space size, a null byte, and then the contents. Okay. So we can sort of verify this if we want, right? So the... What is the easiest way for me to do this, actually? I just want to see if I can find like if we look under dot git slash object objects and we look at I don't know F9. What what is in you? That doesn't look like a blob. 60. That doesn't look good at all. But notice how it starts with the same kind of thing. So I'm guessing these are then probably either commits or trees. 9f. So that starts with xk. So that's a different type of file. 6f. That we know is a commit because we looked at that one earlier. 73. Okay, yeah, so these are all compressed. I don't think we're going to get something useful out of just looking at them. So let's instead try to just implement this first bit. So we're going to have to... Oh, I see, because they're written like this and then Zlib compressed. So all we're seeing is the binary output of the Zlib compression. And so we need to Zlib decompress and then we'll see this bit. Okay, so let's then add here a cat file. And we should be able to here now do command cat file. And obviously cat file is actually take some arguments, right? In particular, it takes which thing that you actually want to print out. So it takes a, what's the dash P here for? Let's go look at cat file. So cat file takes a dash P and I'm guessing PP pretty print based on its type. Okay, so let's go up here and say that cat file is going to take a pretty print pretty print I can't type which is going to be a clap short like this. So it's going to take that, but it also is going to take a positional argument, right? Which is the object hash, which is going to be a... We can choose how we want to structure this, right? So we could say a string. Alternatively, we could say something like... I actually wonder whether... Yeah, it doesn't... Yeah, I was afraid of that. I was hoping the clap might be smart enough to realize that this is something that can be passed in as a string to validate that it's exactly 40 characters. I think instead what we'll do is... is just take a string here and then if they give a hash that doesn't have the right length then we'll just you know error out. There is actually the point here that I believe catfile is a little bit smarter in that it allows you to deduplicate so you only need to give the longest or the shortest prefix that is unique so here because there are no other hashes than the one for this one that starts with this it actually lets you get away with specifying fewer. So that's even more of a reason here to specify string. So this is also going to have the object hash. And then in cat file, we go down here. Yes, we'll have to read the blob object, decompress, extract the content and print the contents to standard out. Okay, shouldn't be too bad. So let's go ahead and do that. The cat file must not contain a new line. That's fine. You can use flay2. Okay. So here's what we're going to do. We have to open the file. In fact, we don't... So there are two ways we could do this. We could either read the entire file into memory and then decompress it in memory and then do what we want with the output. Or we can open an IO reader here, which is realistically the nicer thing to do. File. Open, and we're going to try to open, I'm going to leave a to do here, support. shortest unique object hashes, right? Because what we're going to do here for now, at least, is to just open a file that is at exactly where the path should be according to the object hash. But realistically, we basically want to use something like glob here to put a star at the end and find the number of unique files or the number of files that match that prefix. And if there are more than one, then we error. saying you need to choose. If there's only one, then we give that file. But for now, let's do the whole thing. So what we want to open here is.git slash objects slash, and then the first two characters, and then the rest, where format. So what that is going to be is object hash two and object hash. Like so. So that is going to open the the file and then we're going to use the flake to create. And the FLATE2 crate basically provides different kinds of compression and decompression things. In our case, what we want is a reader and in particular I want to decompress. Do they have a copy paste example I can just use because that would be nice. This is for compression. I don't need multi-member, I guess I'll do read Zlib decoder. Aha! Amazing. So, oh, and I don't really need env here. That one can go away. And so down here, I'm then going to create a new Zlib decoder. And the ZlibDecoder new here takes anything that implements read and files implement read. So we can just pass that in right here. We get a ZlibDecoder back. And... and then it implements read for Zblit decoder, right? So the idea here is that it's sort of a streaming decoder. So it reads from the file, decompresses, and produces a thing that you can then read from. In our case, the question then becomes, okay, what do we want to read out here? Well, really what we're reading out is just a string, but it is a string with a little bit of structure to it, right? So it has this structure right here. So, what we can do is we can either write a custom implementation over read here or in fact here's what I think I want to do. What is size here? Is the size of content in bytes? Yeah, but how many bits though? I see it's just okay it's variable length decimal encoded so this is really just a null terminated string okay so in that case what we can do is use z dot in fact I think what I want here is z is buff reader new of z so buff reader oh all right uh I guess we'll return here a anyhow results. We'll do a cargo add anyhow. So that at the end here, we can do this. And we can do context to make it a little bit nicer than just unwrapping everywhere. Open in get objects. So this, the BuffReader here, so BuffReader is a type from the standard library that allows you to, basically keeps both the read thing that you pass in and a buffer that it's reading into. And then it'll, because it keeps that buffer and doesn't just take whatever comes out of read each time, it can do a little bit smarter things like this growable buffer. You can do things like read until you hit this character. And what we'll do is keep filling into the buffer until it hits something that matches that. Which is exactly what we want here, right? We can do z.readUntil. We want to read until we get a zero byte. And the buffer we're going to give it is, buff is vecnew. So we're going to read until that into this buffer. Like this. Read from Git objects or read header from Git objects is really what we're going to do. And these don't need to be mute. Okay, so if we look at the docs for read until, it says until the delimiter byte or end of file is reached, all bytes up to and including the delimiter will be appended to buff. Okay, so in theory at least, this means that after doing this reading, the stuff that's in here is technically a valid C-ster. So in the standard library. Nope, C-ster. There's a type called C-ster in FFI that represents a borrowed C-string, which is really what we have here. C-string being a null terminated array of bytes. That's exactly what this string is. It can construct these safely from a U8 slice, which is exactly what we want here. So we can do... I wish this was the default view. So what I want is... Let header is sister, what's the name of it? It's like from bytes with null. Yep. This one is safe, but it does scan the string ones to make sure there aren't any null bytes within the thing you gave and that there is one at the end. I think that's okay. Technically, you know, we could have a smarter implementation here that was like, we already know that there is no null inside. So we could use the unsafe constructor here. But in reality, you know, I don't think it actually costs us that much. And it means we don't have to use unsafe. So let's just stick with it. So we can do this and say, This one we can actually do expect because we can say no there is exactly one null and it's at the end. So this isn't an error that should ever be possible at runtime. And so then we can use expect. And I'll grab in Cster here. And at this point, you know, back here, we were told the structure of this is blob space and then the size. So we should now be able to do, you know, header dot. In fact. Header dot to stir. In fact, I'm going to do this context. uh, get objects, uh, file header isn't valid, uh, UTF eight. Uh, and remember here that really it's ASCII, right? But ask all ASCII is valid UTF eight. Um, and so at this point we should now be able to do, uh, let some, um, size is header dot strip prefix. Uh, we want to strip out the blob space that we know is at the beginning. And if there isn't a blob space at the beginning, then something is terribly wrong, right? And we can bail and say, get objects file header did not start with blob. And then we can even give the string that was actually produced, right? Which we can do here. And then we want to say let size is size.parse. And we want to parse this as a use size. It should not be a signed number. The negative wouldn't make sense here. And here too we can give the context of get objects file header has invalid size. And we'll print that out as well. So now we have the size. Now we know that the... Now we know how long the content should be. And so we should now be able to say buff.reserveExact. In fact, we can truncate it to zero, which is really just a clear. And then we can do reserveExact. And we want to reserve exact size, right? So this is saying clear all the stuff that was in the buffer that we read into previously, right? Which is this header bit. And then we want to reserve space for exactly this much, which is how long git claims that the file should be. And at this point, we can now do z.readToEnd. In fact, is there a.read? There's a readExact. That's what I want. See. Okay, so there are two ways to go about this. The question really is, what do we want to happen if this is wrong? Like if the size here doesn't match how much is actually in the file. So we have two options here. We can either do read until end. And what read to end will do is if I pass in the buffer here, it will read until it hits end of file and it will grow buff if necessary. So if the real file is way larger than what the size header said, then this will just keep growing buff potentially until it produces a giant file. And then we could check afterwards that it matches the size, but still. The other alternative is to use read exact. Right? ReadExact takes a slice, so you have to take the vec and produce a slice of that size, and then it will read exactly that many bytes. If it hits end of file before that, it returns an error, which kind of is what we want here, right? Unexpected end of file. If you read something shorter, and then we would, after doing the readExact, we would check that the next read returns end of file. That's sort of the nicer way to do this, right? So I think we're going to go that way for now. So in that case, what we actually want is we want to slice into buff of the right size. So if we go now to... I forget exactly what this method is called, right? Because what we really want here is a... method that returns... Yeah. So, yeah, sure, fuck it. Let's do this properly. So, here's what we're going to do next. I'm going to do it the slow way first and then tell you why it's slow and then how we could fix it. So, let's do... So, where do I want to start with this? Z.ReadExact. at mute something. So this has to be a slice. If I just give buff here, right? Like I can do this. Uh, uh, get objects file had unex, uh, uh, contents did not match expectation. It's a little bit of a weird error. The problem here is that this slice will be of length 0, because we cleared the buffer and then we reserved this many bytes, but the length of the vector is still 0. We made the capacity be large, or whatever the size dictates, but we did not actually set the length to anything. And the reason we did that, of course, is if here, if the length was set to size, what would the value of buff 0 be? Or worse yet, like size minus one. It doesn't have a defined value because we haven't set it to anything. And if there's one thing Rust dislikes, it is like memory with undefined values, right? So there are a couple of ways we can go about this. One of them is resize. So with resize, I can set size and I can say that every value should be given the value of zero. So instead of reserve exact here, I do resize to size of zero. The zero here basically means every element should be given the value of zero. So allocate to size and then set everything to zero. This works. The problem is we're writing a bunch of zeros that we're then immediately going to overwrite. So that feels kind of stupid, right? Like it's an unnecessary performance penalty. What we really want to say is I just want it to be the size and then they're all going to be overwritten anyway. There is a way to do this, which is to use the maybe uninit type. And what you would do is you would have a instead of having a vec of u8 you have a vec of maybe you an init of u8 and then you can resize it you can basically set its length to be you can here you would do maybe an init right. So you would set the length to just have uninitialized bytes the entire way and then you would cast it into the u8 buff that's needed here through an unsafe call. And then afterwards you would assert that the vector now holds initialized bytes. We're not actually going to do that here because it's a little bit of a wonky code and I think it distracts too much from what we actually want to do here, but it's just a worthwhile thing to point out. One of the reasons I don't want to do it is because one of the methods I know I'm going to want here is a... way to turn a slice of maybe uninit u8s into a slice of u8s. There's a nightly, like an unstable function to do this, but it's not stabilized yet. And so I don't want to also need to opt into nightly here. This will turn into an optimized memset, I believe, but even so, you're still telling the computer, write all these zeros. So we're going to stick with this for now. And what readExact returns, you'll see it's an IO result with unit, right? Because it doesn't need to tell you how many bytes it read because you told it read exactly this many bytes. And so now, if we now do a read, and I'm going to pass it a empty, I'm going to pass it a one byte long thing here. Yeah. N is this. I guess this is really read contents of, read true contents of GitObjects file. And this is validate end of file in GitObject file. And here we should assert that n is zero, right? So there should be no bytes after the size that the header required that we do. And here we can use, I think it's ensure, right? n equals zero or get object file had n trailing bytes. Okay, so in theory now we're reading out this whole thing, we write it into the buffer, and at the end of all of this, the question now becomes what do we actually print out? So the thing that comes after here, the content here is like, It's just a bunch of bytes. There's no guarantee that this is actually a string. It could be an image, right? Any file that you can put into Git will have this structure. And so when it says cat file, it's worth pointing out that the stuff we end up with is just a binary blob. And so we don't actually want to print it using like println because println will require that thing you gave it as a string. So instead what we're going to do... is take stood out which is going to be IO standard out So this gives us a handle to standard out. And then we can also even do, if we want to sort of maximize performance here, we could lock standard out so that no one else writes to it. And so that we don't get any garbled output. There is no concurrency here, so it doesn't really matter. But if you, so you can write directly into this, but what that means is every time you write a new chunk, it takes the lock again and again. Here, we're just going to lock it and then do all our writes at once. And so now we can do write to stdout. And we want to write, in fact, I don't want to do that. I want to stdout.writeAll of buff. And then this will be write object contents to stdout. And then this needs to be mute. Okay, let's try to cargo run cat file. The P doesn't do anything at the moment. Let's see what that does. Oh, right. So it tells me, well, I couldn't do that because you didn't tell me about an object hash. So let's give it an object hash. Cat file this. Git object file header did not start with blob, it started with commit. Right, so we've only told it how to print blobs. We haven't put in handling for things like printing a commit yet. That makes sense. So instead, I guess up here, where we're currently looking for the blob header, what we actually want to do is say kind and size is header. split once on a space. And we do expect that that should always be the case. Otherwise, did not start with a known type. And then we're going to match on the... And then the size can keep going here. I'm guessing they all have the same structure for this header. But down here, what we want to do is... Um... match on the kind if it's blob then we do this if it is anything else then we do write to standard out do Ah, so now I know what the p flag is doing. Is almost certainly that if... I'm gonna guess that if the p flag is not given, that it prints out just the raw decompress. So if I did here, if I do git cat file this... I see you need to give the type. And if you give dash p... Never mind then. So if we get anything that's not blobbed, then we say we do not yet know how to print a kind. And this needs to be a comma. And I've done something else wrong. What else have I done wrong? Ah, there we go. Buff is borrowed as mutable. Ah, so this is because we're still borrowing kind here. So let's go up here and do enum kind is blob. And blob is the only one we have here. And so down here we can say let kind is match kind. And we'll make this a little bit nicer. So blob is going to do kind blob. Anything else we do not know how to do. So we'll do, in fact, we'll do anyhow bail. Like this. And then down here, this is now going to be kind blob. And we can get rid of this guy. So now if I do this, it'll say we do not yet know how to print a commit. Fantastic. Do we have something that is a blob is the question. Yes, we probably do. We can print out this tree. Like so. Oops, I meant to use the git version here. Oh, tree. Can I like pretty printed? I probably just doesn't say aha, pretty print. Okay, great. That's what I wanted. So let's try to do I don't know, get ignore is probably a nice short file. And so now if I run our version and do cat file of this. Aha! And look, we indeed printed out what does indeed look like a gitignore file. Amazing! Okay, so that's really nice. Amazing. So now we have a thing that actually does print out what we wanted. We can get rid of this logs from your program will happen here. Let's make that go to E print line. And I think actually up here we're going to do anyhow ensure pretty print. Because Because That's what the real git command does, right? When I tried git cat file of this, it just gives me the help thing and says, you need to give both a type and an object, or you need to give dash P. And so this is mode must be given without dash P, and we don't support mode. Great. Someone else in chat, is it possible to redirect the read of the payload directly into staternet without the vec allocation? It is. So the proposal here is... actually let me push this first and then show you that change. Great, yep. Commit second exercise. Good push. Let's see what it does. I don't know why we have Tokyo in here. Probably get rid of Tokyo here and then get significantly faster build. A whole lot of steps. Let's see what they have in Cargotomo. Why do we need HTTP requests? Interesting. I don't... None of... Oh, we're going to need it for cloning a repository at the end. Right? Then you need to actually read things from the remote. I'm going to just comment those out right now because we don't need it for this. I'm also fine to use anyhow instead of this error. So we'll do this just to... All tests pass. Great. Woo! It's happy with us. Great. So we did the right thing. So now the question is, can we make this a little bit better? So instead of... instead of doing this with sort of reading into a buffer, the moment we've read the header, we don't really need to do anything else. We can just stream the rest of the decoder directly to standard out. And that is indeed doable. So here's what we'll do in order to accomplish that. We'll do a match on kind because that might only be possible for blobs, for example. But if what we get is a blob, Then we should be able to stream it directly out. And then what we'll actually do is Iocopy, which takes a reader. In this case, the reader is going to be Z. And the writer, which is going to be standard out down here. Mute to standard out. And then N is going to be that. And we'll do here... Write... File into to standard out. And then at the end here, after printing it all out, we'll do this assert. Now, the downside of doing it this way, I think I can now get rid of most of this. The downside of doing it this way is we might actually end up copying a lot more. Like imagine that the file is actually way longer than what the size dictates. Then the copy here is still going to copy until the decompressor runs out of stuff. And this is how you get to things like zip bombs. Right, where you just keep reading as long as the decompressor is giving you stuff, even if there's a header that says how much there should be there. And so this is just going to write everything out. There is a way to get, I forgot the one byte read. You don't need the one byte read here, the thing to check that we hit end of file, because copy is going to run until it hits end of file anyway, and then it's going to tell you how many bytes it actually copied over. And so I guess this should be size. Has n trailing bytes, where that's really going to be... file was not the expected size. Expected is going to be size. Actual is going to be n. Like this. Yeah, so we don't really want to do a sort of unguarded read here. There are a couple of ways around this. So for example, it could be, we'll see whether that's the case here. This one does not have it. Decompression settings. Let's see here. This one doesn't. Okay. So there are some of these libraries that actually provide you with a way to set a limiter. Given that we don't have that, what we can do is actually create a limit reader. And it is going to hold a... And there are crates that provide this as well, but we can write it pretty easily ourselves, so we might as well. The reader is going to be an R. Limit is going to be a useSize. And then we're going to implement read for limit reader, where R is read. And we'll grab in here, implement with signal remembers. And in fact, we're also going to implement the default ones specifically because we want to forward as many of these as we can. Well... Well, I'm going to go ahead and do that. All right, well, we'll not do that right now. We'll leave that to other crates to do. My thinking here was that we really want, like if the underlying reader here is an optimized like vectored read, for example, we would really like to make use of that. But in the interest of time, I won't do that here. There are crates that provide exactly this type for this reason. But what we'll do is we'll have a... When you do a read, you'll do let n is self.reader.read into buff. And then if n is more than limit. In fact, we're going to shorten this even more. We're going to say if buff.len is greater than self.limit. Then buff is equal to, so this is going to be mutable, then buff is going to be equal to mute of buff to self limit. so that we never read more than what we're allowed to. We're then going to do the read into buff. And then we know now that this will never be more than n. But then we'll do self limit minus equal n. And then we'll do OK of n. So the idea here is, there's one bit that's going to be missing here, which is we really want to error if there's more. I think what we'll actually do here is we'll do plus one. And if n is greater than self.limit, then we will do a return. I o error new. I o error kind. I guess we'll do other too many bytes. And so now what we'll do here is we'll actually say that mute Z is, in fact, we can do that out here. Z is a limit reader over the original Z with the limit of size. And so now we can guarantee that we'll never read more than this size out of the reader, ever. If we do, then we're going to get, if the reader produces more bytes, then we'll hit this error right here. Right? And so now this copy is going to read through the limit reader, which means it'll be limited. It might still produce a value that's lower than size, in which case we want to error. But if it produces something that's more, then it'll hit this too many bytes error. And so in theory now cat file should still work. And it's going to complain about this bit because the limit here needs to be a U64, almost certainly. This then should also be a U64. This U64 is fine here. I guess the limit here should be useSize. And then this should be as useSize. That's fine. So it still now works, but if you were to get an object file where the size header didn't match, you would get an error regardless of whether it was too long or too short. Could you use read or take here? Ooh, you might be right. Uh, tick. An adapter which will read at most limit bytes from it. Yes, indeed. Although, although, so the difference here is that this will return end of file. It won't error. I guess that's okay. It's kind of nice to get the error, but in the interest of shorter code which is easier to reason about, we can choose z is z.takeSize. Has the same effect. Good call. I'm going to try to get the error. Note, this won't error if the decompressed file is too long, but will at least not spam stood out and be vulnerable to a zip bomb. And just see this still works. It does. I can now get rid of the IO thing. Let's do add cargo lock and cargo TOML first. Comment out big depths for now. And down here. Mitigate zip bomb. Big push. I've educated chat so they've surpassed me now. Chat now is telling me about all the things I didn't know about Rust. Yeah, so see how much faster the build was here? Because we got rid of all those extra dependencies. Nice. We might have to add them back, but at least we don't have to wait for them for each stage until that last one. Okay, let's go see what the next exercise is. I guess we're supposed to be doing exercises. Create a blob object. You'll implement support for creating a blob using the git hash object command. Use to compute the SHA hash of a git object. When used with the W-W flag, it also writes the object to the git objects. Oh, I see. So this produces the hash that this file would have created if it were added to git. And then with dash W, we'll also write it into git objects. Nice. Okay, that should be pretty easy. Yeah, yeah, that's fine. We just, it thinks we're at the next exercise. So it goes that failed, but you see stage two succeeded. Okay. So we have a, what's it called? Hash object. Right. And then this is going to be a file, which is going to be a path. Okay. So down here now, command hash object, write and file. Let's see what we get into. So what are we going to do? Well, we need to compress it. And then we need to add the sort of... size field to it. No, the other way around. We need to add the size header, and then we need to compress it, and then we need to hash it, right? So the inverse of what we did for print here. So the header is going to be... Ah, see, here's the awkward part. We don't know the size until we've read the bytes. Which means we can't start writing, we can't start creating the hash until we've read the bytes. Because the hash is going to include the size and it comes before the contents. So this is why often these formats tend to have the size at the end rather than the beginning. Because that way you can do it in a streaming way. But if you have it at the end instead of the beginning, then when you read, you can't pre-allocate the storage. So like, you have to choose either writing is annoying or reading is annoying. And in this case, writing is annoying. Which is fine, or rather, when I say annoying, what I mean is a little bit less efficient, right? Because you need to read all the contents of the memory in order to produce the hash you're eventually going to print. You can't just stream it through, but you can stream reads. And so, given that you read more often than you write, that's probably a worthwhile trade-off. Yeah, we can stat the file to get the size ahead of time. But that doesn't work, for example, if file here were standard in. So my guess, for example, is that hash object also allows you to pass things from standard in. Ooh, does it not? Oh, it doesn't. Well, okay, then ignore me. Then what we will do is stat the file. So we'll do metadata of. file context stat file and we can actually here be a little bit more helpful right so we can do with context format and then actually print out the path to the file and then the we'll do a hasher is going to be, so I think we have the SHA-1 crate here already. I think that was already in the cargo toml. Yep. So we're going to create one of these guys. And we'll add in this. And I think we already have the hex crate. Although I think actually looking at the cargo toml, I think it's using hex. And I seem to remember. Something about... Because there's hex and there's hex literal. Hex literal, I think, is maintained by the Rust crypto community, which like crypto as in cryptography community, who are the same ones who maintain things like SHA-1. And so that's why they're using that here. I don't think it matters too much either way. We'll use hex, that's fine. Given that that's already in our cargo toml. So we can just leave that. So we'll go down here. The hasher is going to be that. Then we're going to be dot, sorry, hasher dot update. We're going to write blob space. And then hasher dot update. The size, which is going to be a format of. stat dot Rust Analyzer doesn't like me now. stat dot len and then we update it with the size and then we update it with a literal zero. I wonder whether does update allow that as refuate? No, so I'll do this then. Like this. And then we want to stream in the actual contents of the file into no, this is not even true because we need to compress it first. So ignore me for a second while I grab the Zlib encoder. Oh, there's a std in flag for reading from standard in. Okay. We want the Zlib encoder. We'll grab the same things here. And we want one of these guys. So, ah, so this is where it's going to be. Yeah, okay. So we create a Zlib encoder and we're going to write all blob space. Compressor is going to come at the end. Hasher is also going to come at the end. We're going to write all that. We're going to e.write. In fact, I think E here just implements write, so we can just do this instead. It's a little bit nicer. And that way we don't need the format. We can just write E stat.len. And that also means we can then write... I think... We can just do this like so. And this is going to have a question mark. That's fine. So the compressed thing here. is a vecuate, right? So here, you give it what you want it to write the output into. And so here, really what we would do is construct the file that we're going to output imp to, but we also want to keep a running hash of the thing that we're writing so that we get the hash at the end. So what we actually want here, and here I am going to claim that we need our own implementation of write. So we'll do a hash writer. I mean, unless the SHA-1 crate has one, but I'd be very surprised. Yeah, it does not. Okay, so we're going to hashWriter, which takes a W. It has a writer and it has a hasher, which is going to be a SHA-1. And then we're going to implement write for hashWriter, where W implements write. Implementing members. And again here, there are a bunch of other methods you might want to pipe through. We're just going to skip them for now because the efficiency isn't that important to us. So here what we want to do is self.hasher.updateBuff and then self.writer.writeBuff. And actually this is not quite right because... We actually need to do this. Buff end. Because it could be that the writer does not write all of the bytes that it's given. And we would only want to update the hasher with the bytes that were written because they're going to be given in again the next time around. So this is actually, I'm guessing someone has been tripped up by this in the past. And so then we're going to return OKN. And flush is just going to be self.writer.flush. We don't need anything special for that. And so now our writer is going to be. And this is going to be annoying, isn't it? Yeah, the generics here are going to come bite us, which is a little frustrating. So we'll have a function for this. I'll show you what I mean. So what I really want up here, right, is writer is if write else. If write, then we want. Oh, actually. The thing we're going to have to do here is file create temporary. I'll talk about why we need temporary in a second. And otherwise it's going to be a vecnew. In fact, it's not even going to be that, it's going to be unit because we don't actually care about the bytes that are written, we only care about the hash. And then what I want here is given writer, the problem as the compiler helpfully points out is that the if and the else have different types here. And so writer doesn't have a well defined type, it has one of two, which means Zlib encoder has one of two because it's generic over the writer. And you know that we could put all of this code inside each branch. But instead what we'll do is just have an FN. In fact, we can define that up in here. Write blob. It's going to take a W. And the bits that it is given is the size. and the file size here being a u64 file here being a reference to a path in fact file size and writer and it's going to return an anyhow result like so and then up here we're going to go up and grab this put that in here and so now if write then we're going to do write blob of file and stat.len stat.len and this write blob object to disk. And really this is to temporary file. And again, I know I've promised to explain why we have that. So I'll do that in a second. And otherwise we're going to do this. Like so. And then down here, this is going to return okay, nothing. So inside of here, we're going to take the writer where W is right. And we'll grab in path here. I was pretty sure unit implements right. Does it not? Why? It should. Right? Really? There's a sync type instead of it just being for unit? All right, fine. I'll try to sync then. So this is a writer which will move data into the void, i.e. it does no actual system calls. Generally called by calling sink. Okay, fine, fine, fine, fine. Like this. So inside here... We're going to open a Zlib encoder over the writer that we're given. We're going to write out blob space and then the size. And in fact, that means this bit can also move in here because no real reason for it not to, at which point we don't need to take the size anymore and these don't need to pass it in. Oops, one too many. And that's that. And at the end here, we're going to call e.finished. Compressed gives us back the writer after we finish the encoding. And the hash now is compressed. So the writer is actually going to be one of our hash writers here. So hash writer. of writer and hasher where the hasher is a shao1-new like this and so now the hash at the end is compressed dot hasher dot uh what is it finish for shao1 dot finalize uh and then this i guess we will do um you here because these write blobs return an error, which is going to be write out blob object. And this is really construct temporary file for blob. Right, and at the end here I guess we get the hash, so we could really just return the hash here instead. What is the type of hash? It is something... ooh, not what I meant to do. So in our SHA-1... SHA-1 here... SHA-1 core. Is there an easy way I can name the output here is what I want to know. Fixed output core. Output size user. I just want something that I can name as the output type here. That's fine I can have it be a hex instead I suppose. So this will output then hex and code of hash. That's ultimately what we get out here. And so this then is going to be let hash is this. And this actually, I'm going to have to explain this now, I think. So here in the right case, The hash we get out, like we wrote to a temporary object because we don't know the hash until we've run through the entire input file. And so we don't know where we want to write the output to until we've done reading the input. And so we have two options. Either we do it all in memory and then write out to the final file or we write to a temporary file and then we move the temporary file to where it's supposed to be. And so in this case, I'm going to do the writer, the latter. I mean, we're going to do create their all. oops yep dot get objects this where that is going to be hash dot dot two all right so the first two characters create a subder of get objects And then we're going to do std fs rename of... And this, let's for now just say it's called temporary. Realistically here, you would use a... You would use an actual thing that generates random temporary file names. For now, I'm just going to pretend that we have one of those. And then we're going to move it to the final location, which is going to be this and hash to onwards. Write out or move blob file into get objects. And in fact, I think we want to get the hash out of here. Right. So this is going to. This is going to do this, this is going to do that. And then here, regardless of which path we took, we do want to print out. And it looks like hash object prints with a new line. So we'll print out the hash regardless of which path we took. Okay, so now let's just see if this does the right thing. So if we run hash object cargo lock, what does it do? Well, it printed a hash, but it's not the same hash. So that seems problematic. As the list of implemented commands is growing, why not split the command processing of each into individual dedicated functions? We'll probably do that. There's no real reason to have them all be directly in line here, I agree. And it's going to get unwieldy as we grow. Same thing as there are probably some things we can reuse, like, for example, this way to construct paths. There's no real reason you want to repeat it. Quite to the contrary, you probably want to share it. I think for now, this is okay. But the moment we go to the next exercise, I'm going to split them. Okay, so we did something, but it's not quite right. Like we end up with a different thing than Git does, which I think suggests that what we actually want to do here is a... I want to do a dash W. And then I want to do a cargo run dash W. And then I want to diff.git objects 31e, which is what Git actually produced, and git objects 2a 70, which is what we produced. So it did actually write to the file. That's good. Okay, binary files differ. Yeah, that's fine. So let's do hex dump of each of them. Uh, heck stump of the sky. Oh, I want to, uh, decompress them, which is, uh, Z cat. I think Z cat can do this. Can't isn't there? Um, Oh, I don't remember. There's a command equivalent to ZCAT but for Zlib and it is called... Anyone remember? I don't have x60. Really, Neovim doesn't ship with x60? That's really annoying. Yeah, I could write it out myself, but I also want the ditto one. There is a command for this. I'm just blanking on the name. In fact, I actually wonder whether I can just have vim open the file. No, I did not want to do that. No, Hexdump. I don't think Hexdump has a... Uncle Brazilib data. Wow. That's awful. That's so awful. I mean, I guess that works, but that's just terrible. Wow. Wow. Okay. That's, that's so stupid. Um, okay. So that's what it gives for our file. And then what was the other one to a one, uh, to a, wasn't it to a, uh, too hard to scroll, uh, to a S to a 70 is hours. Interesting. Well that's certainly different. So notice that the one from git has the actual file contents and ours just does not. Interesting. Uh, why do we not... Oh, that's because we don't actually write the bytes at all. We write the header, and then we don't write anything else. We need to std io copy of... Um... That's so stupid. Okay, std fs file open of file. We'll grab this same one here. So of course they're different because we didn't do it. So we're going to read from file into e context I guess stream file into blob. This needs to be a mute. Okay, let's try that one more time. How about now? Okay, so it's different. It's still wrong, but it is different. Okay, let's now see what we get this time. So, no, 5 1 A is now ours. Okay, that's better. And what is the one that Git produces? Git produces this one. Okay, so our sizes are the same. Right? Blob 11 792, blob 11 792. This file is automatically generated by cargo. Interesting. So how are these different? Let's do a hex dump of this and then do, I only need like, I think I only need like the first three. What? I'm apparently bad at files. 30. And then I want the five. 51. So that's gets and this is ours. That all looks the same to me. A B, if a B, they are the same. Ah, is the SHA-1 of the uncompressed? Because that would certainly make a difference. If it is the hash of the... Yeah, the notes at the bottom. SHA needs to be computed over the uncompressed contents of the file, not the compressed version. Okay. That actually makes this slightly more annoying. Just because it means that what we really want is a hash reader rather than a hash writer. It's not the end of the world. So we just switch this around. We implement reader of R. Let's keep this because we're going to need almost the same thing. Why not wrap the Zlib encoder with your hash writer? No, because the whole point is that we want the hash of the input, not the hash of the output. And it doesn't include the header either, I don't think. In fact, that's going to be kind of weird. If the hash is... in fact, we can find this out, right? What is the SHA-1 sum of cargo.lock? Okay, what is the SHA1 sum of this file? Like the decompressed. Okay, so it's with the header, but it is uncompressed, right? So you see this hash here in the file name matches this hash here. So it is a hash of everything, including the header. And so therefore, we could just have the hasher operate on the... Yeah, we can still do it on write. We just need to do it before the compress rather than after the compress. So we want here this. So this is going to be... Yeah, so you need to invert this in your head, right? Because when we do a copy here, the writer that's hit first, so the thing that gets to see the initial input, is the last writer we sort of add to the stack here. So this is the encoding writer. that ZLib encodes, and then we wrap that in our hash writer, and our hash writer is then going to see the uncompressed bytes and then forward to the ZLib encoder, which is going to produce the encoded bytes. So here we do writer, hash is then writer.hasher.finalize and this is writer.writer.finish and we don't actually need the writer back and this needs to be mutable and this does not okay let's try that again that's the right hash amazing amazing All right, so now we have the same hash as what Git does. And just to sort of, I mean, there's no, there's nothing to diff here, right? Because the hashes are the same, and we would overwrite the same file. Okay, great. So now we have hash object working. And at least in theory, if I do this. It prints it, but hopefully did not do anything to get objects 31E. So that was last modified. In fact, I can just stat that file. And then I'll run our thing without dash W and then stat it again. And the modification time is not changed. And if I do dash W and then stat it, then the modification file has changed. Okay, so now we have a hash object that seems to be doing the right thing. Amazing. We can remove A and B. And our diff here is implement hash object. Git push. See what it does. All tests pass. This string of your dumpty, yikes, dumpty, donkey, dooby, donkey. What a string. Okay, fireworks. Amazing. Next stage. Read a tree object. You'll implement ls tree command, which is used to inspect a tree object. Okay, so before we do that, let's do a little bit of splitting here. So let's, in fact, make modules. create that. And then I want cat file. And I want hash object. And inside of here, bump bump inside of here, I want to grab all of cat file. That's going to go in here and that's going to be something like, I guess, invoke. And that needs to take pretty print. And here we could introduce like an object with, uh, with options that we could then forward on to clap, but I don't think we're quite at the point where we need that yet. Uh, object hash here is going to be, um, only really needs to be a string reference. Uh, and this returns an anyhow result of nothing. Well, I typed too fast. Uh, and then this goes in here. This goes away. This goes away. These go away. These go away. These go away. And then kind also is in here. It's the only place it's used. So cat file here is now going to do command cat file invoke of pretty print and object hash. Commands. Yeah, I can borrow. And then if we also look at command hash object, we'll do the same thing, which will take this bit and we'll do fn invoke returns anyhow result of nothing. And it takes the arguments write and file. So write is a bool and file is a path. And this is now going to call commands hash object. invoke of write and file. Right. And I guess we'll do here a pub create. I also can't type apparently today. Hash object is going to get these. Same thing with the writer that we have at the bottom. That's also going to go in here. And if we go into commands, these are both going to be pubcrate. And cat file is going to be pubcrate. Oops, pubcrate. And hash object is going to be pubcrate. And it's yelling at me for something. I've missed my syntax here. This should be a colon. We go path buff goes away. This goes away. This goes away. These go away. We now go back to main file takes a reference. Amazing. And just check that that still hasn't broken anything it has because cat file needs an okay at the bottom. And main now doesn't need most of its imports, which is nice. Like so. Split commands into mods. When not streaming for day-to-day coding, do you use Copilot? No, I do not. I've just never really found a use for it. Maybe it's just because I haven't integrated it with my editor, but... I like my bottleneck is not usually actually writing the code it is thinking about what I want to write and how to do it well okay so now we have this split into mods so now we want to add the next command which is LS tree LS tree as take any arguments I wonder okay well we'll look at the tree object later No arguments except maybe name only. Okay, so clap long. Name only is a bool. So we're going to have here ls tree name only. That's going to be ls tree invoke. Name only. We're going to go here. We'll do this. Create that module. And we'll go grab the start of this from here. And this is where we're probably going to start to see some reuse between cat file and LSTree, right? Because LSTree also needs to read out something that is stored in the object tree. Or in the object store, rather. Right, so the output of ls tree would be tree shaw. Yeah, so this is the thing we looked at earlier, right? So if you have a nested repository, a directory structure like this. the actual tree object only store the listings for one level deep. So that means if you do lstree of the tree of your repo, you would have file one, dir one, and dir two, and nothing else. For the things that have subtrees, like for anything that is a directory, the hash that's listed for that entry in the tree blob or the tree object is a tree hash that you can then recurse down into. Whereas anything that's a file is a blob hash. So, right, so the caller that wants to print out something like this would actually need to walk the tree objects going all the way down. The app is alphabetically sorted. This is how Git stores entries. That's fine. With the name only flag. Okay, so we are going to here, name only. We're going to anyhow, ensure, name only. Only name only is supported for now. And eventually we'll return one of these. Right. So we're going to get a tree shaw and we're just going to print the names and nothing else so that we don't recurse down. We recommend implementing the full LSTree output too since that will require you to parse all data in a tree object not file names. Ok so it's optional whether to also print the hashes but we are allowed to. Great. Let's look at what a tree object looks like. Trees are used to store directory structures. Multiple entries, yup we know that. The name, the mode. The piece of value is this. Interesting. So are they actually stored in ASCII? Aha! We have a command to check this now. So if I now do a git rev parse head. No, git cat file. No. Commit of this. So this has this tree. So let's now print out... 60 slash 46. Interesting. Oh, that's because that commit didn't actually commit anything. So do a rev parse of head. Let's do this guy. Cat file this commit. Tree here. So if I now print out the tree at 6790. Oh, that's because I'm passing it to Shosam. Okay. Interesting. So there is something binary in here, right? Like this is the contents they're talking about, but what I'm trying to figure out is whether that's actually what's stored in the file. But it sounds like what's actually stored in the blob is this thing, right? So we see the object prefix, the object header here. And then this is what's actually stored, but this is binary, this is not ASCII. So it's not actually ASCII that's stored in the file. Tree object storage, here we go. Tree object is stored in the git object directory. That's fine. Looks like this after zlib decompression. Tree size 0, okay great. Mode space name, null byte, and then a 20 byte SHA. I see, and the 20 byte SHA is stored as an actual, like is not stored hex encoded. Great. Yeah, exactly. Not hexadecimal. Okay. Amazing. So in that case, I think this shouldn't be too bad. I think what we actually want here now is if we go back to main here, we'll add a And then we'll go to a cat file. And we'll pull out some of this. So I'm going to figure out what we actually want to pull out. So we want something like read object. It'll take a hash and it'll return an anyhow result of something. And in fact it'll return a tuple of kind and... kind and an impulse read I think or buff read maybe I think is actually what we want so when we get down here yeah let's have it also return the size actually which means we'll probably want to correct here and we'll say this is an R. So it returns a kind. It returns expected size, which is a U64, and returns a reader, which has the remaining bytes of type R. And what this will actually return here is an object. And I'm using impl here because I don't necessarily want to commit to exactly the order of operations in here and which wrapper types we use, for example. And so now I think we can then return here, okay of object where the reader is Z, the kind we have, the size we have, expected size is this. And this no longer needs to be mute. This is pubcrate, this is pubcrate, and this is pubcrate. And now in cat file, we can use... create objects... We only really need read object here. And I guess actually, let's do a... Let's have that be an imple on object actually. It's going to be a little bit weird as an imple on object actually. I guess I can do this. Because otherwise the R on the outside doesn't actually matter here. So we'll do this. And that way this can be object. And we'll also take in kind. And then we can do here now. That object is object read of the object hash dot context. Parse out object file. And we can go down here. And we can match on the object.kind. And if it's anything, I guess here we could now have cat file understand how to print trees. But for anything else, we'll do an anyhow and anyhow bail. don't yet know how to print, to print trees, print these object of kind. And then we'll do, we'll add a couple of things to this. So this can derive debug. It can derive a partial E can E. And we can also just do imple display for kind, because why not? We'll bring instead here, we'll implement this, and this is just a match on self. There are crates that give you this, but it's not too bad to just do yourself here. Like so. And so now you can print an object.kind here. And we're going to go ahead and let this be pub create all the fields. And so now this is going to be object.reader. This is going to be object.expectedSize. Like so. And that means I need the object to be mutable because I need to consume the reader. Nice. That makes this a whole lot nicer. And this says unreachable pattern because we don't have a tree yet. We'll have a tree now. Tree is never constructed. That's totally fine. This is tree. How to parse or to process a kind we might as well add commit here to actually while we're at it, because we know it's gonna we know it's gonna come right. So commit is also going to presumably have at least the same structure for the the header and the compression. So this is more of a, what even is a? Great. So now we have a sort of generic object reader. The pretty printer in cat file just knows about blobs. And the thing in LSTree is a to-do for now. Just see that this still works. Yep, so we'll do a commit. Pull out object store reading. Okay, do we need this to be generic or would it be simpler to just box the reader? Yeah, so up here in objects, instead of this storing an R, we could just have a boxed in reader here. We didn't read. I don't really want to do that because well. It's fine for this to be generic and only ever be instantiated with one type, and that way you don't get the indirection via box, so it's more efficient, but you also get the code to be able to take, like you get monomorphization, right? And it doesn't really feel like this is particularly painful, right? Sometimes generics get really painful and boxing it just makes the pain go away. There's not really any pain here, at least the way I see it at the moment. The other nice thing here about using the monomorphization is We can, we get the stuff from buff read, which is going to be pretty nice. And you also, well, I guess you could get the buff read stuff through dynamic dispatch as well. I think that trait is object safe. But I don't think there's really any win here from switching this to dynamic dispatch, at least at the moment. We could always switch it later. Okay, so now let's go and see if we can do LS tree. So, ls tree, huh? What does it, what do you give it? You give it a tree shaw. Ah, so in main, this also needs to take a tree hash. Uh, and similarly then, you know, this needs a tree hash. This needs a tree hash. LS tree needs a tree hash. This needs a tree hash. This is now a tree. The only thing we're going to be willing to grab in here is a kind tree. How to. LS, a different kind. And then this bit in here, it is TBD. I'd also be tempted to call that type read object so you can have a write object later. Yeah, it's not a bad idea. The object that we have here is more like an object ref, really. The cool thing actually here with this being generic in this way is you could also have the reader here be a vector. So you can have an object entirely in memory, which would actually also work for write. In fact, this doesn't even need to be a reader. This could be a file that you write into. It's a little weird, but it could be. I'm going to leave it as object for now. We'll see how we adjust it over time. Okay, so let's now go back to this bit here. So the representation is this bit, which is the header, which we already read out. The actual file doesn't contain new lines. Okay, good to know. So these are going to be... All right. And how's the mode encoded? Is the mode encoded in... decimal. Okay, great. So here's what we have to do. We have to, this sort of alternates, between mode name and 20 by Chaz. And I think the easiest way to split this is probably to read until a null. Yeah, I think this is actually just a pretty simple loop. So we do a loop where the first thing we do is mode and name. And we can have a buffer here, which is the thing we're going to read into. So mode and name is going to be object.reader. Okay. And we want to read until we hit a zero, really. And in fact, yeah, give me buff read. So this is going to read into the buff. And in fact, we're going to buff.clear at the start of each of these loops. We're going to read until we get a 0. And this is going to be n. If n is equal to 0, then we break. Because that means we've hit the end of file. Provide the argument context. Oh, mute. What is it? Oh, here. Read next tree object entry. So here, we know that what's now in buff is going to be the mode on the line. And so what we'll do is, in fact, we could have two different buffers here. We could have one, which is going to be mode and line. And we can have one, which is sha1 hash. So this first thing is going to be.readUntil. And does readUntil allow me to write into a string? There is a version of this that does. Um, but it's not going to be nice. Is it because it's a C string instead? All right, fine. Uh, we won't do that. Take that back. Um, so we're going to read until this, then we're going to say, um, mode and line is C stir, um, new with, wow. What was the name of it? Uh, from bytes with null from bytes with null of buff. And then we can do here invalid tree entry. And here's actually an interesting question, right? So one thing that's interesting about C strings is that they're not guaranteed to be valid UTF-8. So you can imagine running this command on Windows, for example, where file names are not encoded as UTF-8. Or in fact on a Linux system where it's configured in the same way. And so the byte string that we get out of here is not necessarily compatible with a string or a str reference. It truly is a C string, whereas it's a sequence of bytes that don't contain a null. And so the question here becomes, how do we want to print these? I think what we're going to do here is just print them out raw maybe. So we do know that mode and this should be mode and name. Mode and name is going to be mode and name dot. Oh, isn't there a. I thought. 2 bytes. And then I want to... I want a split once. Why is there not a split once? I think the mode lines are actually guaranteed to all be the same length. That's an interesting question, actually. Mode, Like, I think they are always six characters. Realistically, they really should be just... We should just look by space, I suppose. So there is a split first. No. I'm fairly sure there's a rust slice. That's not what I wanted to do at all. Give me the splits. Do I really need to do a find and then a split? Come on. Well split does what I want. And I guess I could do a split end but I want a split once. Yeah, split once. Oh, it's nightly only. Fine, fine. We'll do a split n for now. Uh, two and, uh, this is going to be b is equal to space. Which I think technically is like is ASCII white space, but I think it's specifically we want it to be a space here. Let me go ahead and do this. And do to do place with split once. So. bits is going to be this let mode is bits dot next dot expect uh split always yields once uh let name is bits dot next uh dot okay or else uh anyhow anyhow um this is like a tree entry has no file name Great. And this should be this one. This can be this one. Okay, so we have the mode and the name. And the next thing now is the hash. Now the awkward part is that if we read more into the buffer, it invalidates our borrow of the buffer, which is where mode and name here live. In theory, we could sort of read into the tail end of the buffer, but we're not guaranteed that that won't reallocate and thus invalidate the references. So I think what we'll do here... I guess we could just print out the name before we have the hash. I suppose that's okay. Well, what's the expected output format here? It's like We're supposed to print mode space. Oh I guess for now it's name only. So let's do the name only so that means this will just will Will do stood out dot write all name write tree entry name and notice here we don't actually require UTF-8 right we just write the bytes directly out to standard out and if they happen to not contain UTF-8 that's up to the terminal to deal with and then we'll also do a write line to standard out of nothing Write new line. A little sad we can't combine these. And then we just don't do anything about the mode. Because we would only print that if we didn't have name only. And then this can go away. We're not going to do that anymore. And then now we're going to buff.clear. And then we're going to... Actually, we don't need to do that. We can do the following. We can just read them all into the buffer first. That's what I want to do. So here, we do a read until, if we hit end of file, then that's fine. We break. And then we do n is object dot reader dot read. Why isn't there a read exact? Because what I really want to do here is I want to take the buffer and I want to... I want to take the object reader and I want to read another exact. and I want to read sort of from n to n plus 20, right? There is a way to do this. I'm just getting the compiler to help me here for a sec. I didn't know that's why. Okay, so read exact. And in fact, I guess I can actually just use a different buff here. So I'll do hash buff is a 020. And I want to read into hash puff. Read tree entry hash. Tree entry object hash. And that is all that's stored in there, right? Yeah, okay, great. And here too, if n is less than, it is 20 bytes, right? Or is it 40? 20 bytes, yeah. Is less than hashbuff.len, then break. Expected because this is unit. Ah, read exact will error anyway, if the number of bytes available is not enough to fill hash buff. So that is fine. And so now we actually do have access to this at all times. So we're good. Okay, so this writes the new line. And then we loop. Are we allowed to print a trailing new line here in the output of LS tree? Alphabetically sorted just because that's how it's already stored anyway. Are there any notes? Yeah, okay. I think that's all, right? We have the hash here. We just choose not to print it. In fact, we could say that the hash is hex and code of hashbuff. So down here we could say I do need that to be here. And then we could say if so now we don't need to enforce name only. So if name only, then this one's easy, we just do this. Else and we can have the new line printing at the end anyway, else we encode we decode the hash and we print out. So for the long output of this, it's supposed to be mode space, the kind which we don't know yet. And that's presumably why this one is semi optional, right? So we're going to write all of the mode. And then we're going to write all of the name. And then I guess actually in the middle here, we're going to write space. In fact, we could just write out buff here. Right, because that's already how buff is structured. So we can just do buff here instead. Tree entry to stood out. But the interesting next bit is then we want. Oh, no, we can't do that because we actually want to write the mode and then the hash and then the and then the file name. So we don't actually want to write buff. We want to keep what this was. So we're going to write the mode. Then we're going to write the hash. So we're going to write space. Tree or blob. We don't know yet. Right. So let kind is let's just say that that's tree for now. So the kind is going to go there. And then a space. And then the hash. And then I guess this is probably some like spacing to make them all be aligned. But let's do a single space for now. And then it'll be the actual name. So this is right. tree entry hash just it out. So all right, let's do a cargo run of let's do a get Ellis tree of Do we have a tree here? This is a tree, right? Yeah, this is a tree. Okay. A moment of truth, I suppose. Cargo run. That looks right to me. Right? And if I do name only, prints only the names. The hash isn't printed raw. It's stored raw in the file. We do need, yeah, so there are two bits we need to fix here. One of them is aligning the mode. So that should be zero padded to six characters. Luckily, that's kind of easy. So we actually know that the mode is UTF-8. So here we can do let okay mode is mode.toStir. Mode. Stir from UTF-8. In fact, isn't there a mode.ascii? That's fine. Std stir from UTF-8 of mode. And we can error here too, actually. We don't need to do that. We can say here that the mode is always valid UTF-8. And now we can do this. Mode colon 06. Colon left? No. I think it's just colon. It's fine. It's just a meta. Am I just being silly here? There we go. Just needed to get the characters in the right order. So now you see this entry here is right aligned with zeros padded on the left. which is indeed what we wanted. And the second one is that they're all listed as trees, which is obviously wrong. The way for us to fix that part is we will need to read out each object to figure out what type it is. And so the way to do that is going to be very slightly painful, but it's going to be object is object read of the hash, which conveniently we have right here. And so this is read object for tree entry. And here we can be a little helpful so we can give the object hash. A hash. And now this is the object kind. And notice that we don't do the rest of the readers. So we don't actually stream in the entire file contents. We just read the header. And now these are all blobs and this one is a tree. And so in fact, we should be able to now sort of recurse, right? By calling ls tree on this. And now it works further down. Amazing. It does the right thing. Okay, makes me happy. We now have ls tree. Implement ls tree. Hit push. Let's see what it thinks. Alt has passed. Back to the browser. Boom. View next stage. Write a tree object. Ah, so this is the first time we're gonna... Well, I guess technically we created a blob object with our write object earlier. So write tree... Do we actually need to implement the staging area? Yeah, exactly. We won't implement the staging area. We'll just assume all the files in the working directory are staged. Create a file with some contents. Git add... git write tree. The output of git write tree is the 40 character SHA hash of the tree object that was written to git objects. We'll have to walk the files in the working directory, create blobs for every file. For directories, recurse into them, create tree objects, record their hash, and walk all the way up. Okay. So let's see, it's 1. Okay, so we're at the 2 and a half hour mark. Wow. Time flies when you're having fun. That means I'm going to go make some tea. And then I'm going to be back in a second. And then we'll take a tree object. Okay, I'll see you all shortly. I have returned. I have tea now. It's the volume really low. I hope not. My audio gear says that it's good. Does Rust not have an organized imports equivalent like Go does? So Rust Analyzer does. You see it sort of, when I automatically, when I use the code action to add an import, it will sort of reorder them and stuff. It doesn't automatically remove unused imports, but if I have something like a hash here, then there is a code action to remove unused imports. There's one for the merging imports as well. So it does have them, I just don't use them very well. Okay. So now we're writing tree objects, huh? So that's back to our hash object thing that we had earlier. Right, so we'll want something very similar, except that we want to be able to write things that are not necessarily just a blob. The interesting thing here with trees, and this gets at the thing we originally had with files, right, with blobs, is that you don't know how long the input is until you've assembled all the input. And in the file case, we could sort of cheat because we could stat the file. But even this is kind of technically dodgy, right? Because there's technically a race condition here, which is imagine that the file changed between when you stat it and when you actually stream it through the encoder. If that happens, then what we'll end up doing is write the wrong size in there. So like, you know, we're sort of cheating here. Let's see here. My T alarm went off. So I'll write a... Technically, there's a race here if the file changes between stat and write. I would write in the contents of the file first and then go back to append to header work. It might, although prepending to files is kind of annoying. Like appending is pretty easy. Prepending is usually expensive. The way I would probably do this instead is you... There are cheats you can do, right? So you can, because this is encoded as an integer, you can always prefix zeros to an integer. So you just write out a bunch of zeros and then you just replace those bytes after the fact, but that's really not very nice. Another alternative is you write the raw bytes to a file and then you read that file out, which you know is not going to be modified, but even then it's kind of dodgy, or you write it all to memory and that way you're guaranteed, but it is a little weird. And... So it's all like, it's a little painful regardless. I think what we'll actually do here is... I think it just has to be in memory. And the way this is probably going to work in practice is I'm going to change our object thing here. And I'm going to give it a right. And I think I know some people are going to hate this. And right is going to take a... It's going to take a kind. This is going to take a size. And it's going to take a writer. So object here is like kind of a lie, right? Because we don't really have methods on object. We just have constructors to it that we're essentially namespacing. And then... Is this what I want? I'm not sure if this is what I want. I don't think it's going to give you back an object actually. I think it's going to just do this, which is kind of stupid. But what's interesting is that this can actually be a method on object where we basically say that this R is any IO. So it can be a reader or a writer. You get it back as a reader if you call read, but you can construct one with a writer and then you can call write. It's pretty gross, though. But here, let me show you what I mean. So it'll take a self, and it'll return the object ID. And the way it'll do that is here. And so we'll steal the hash writer from over here. Write. Import that. Import this. And this is going to write now self.kind, self.expectedSize, self. And I'm going to rename the reader field. I'm not going to let reader be a writer. We're going to import digest. And then this is going to be from self.reader. No. No. The. The thing inside object needs to be the... I'm basically trying to decide whether the thing inside of object is the object we want to write or the writer that we want to write to. Maybe it actually is the reader. Maybe write takes... Maybe it actually is pretty reasonable here. So maybe this is a read. And then write takes a w that implements write and it writes into that w. And so this does actually write into the writer. And then this is the self reader. And then we return OK of the hash. And this has to be dot into. And this has to be a mute self. Right. And when I say write, I mean write as in R-I-G-H-T, not W-R-I-T. Yeah, we could use we could use simple right here. That's fine, too. And so now this let's see if we can actually rewrite this one in terms of the other one. Now we should now be able to do all object. Kind is kind blob. Expected size is stat.len. And reader is file. And now I should be able to dot write that. into writer. Like this. And then the hash should be the outcome of that. Right? This goes away, this goes away. That's not too bad. And the reason why this is kind of nice is because here the reader is a file, but because vectors implement read, we can also just write as we will for trees, for example. We could take a string and we could provide that as the reader, which will then have that be the thing that gets written in. So I think this will actually turn out pretty nicely. And then it remains true that this is a reader. Thank you. And so now what is the type it wants us to implement? It wants us to implement writeTree. Okay, so let's go back to main here. writeTree. And writeTree takes no arguments. Or at least for now. right tree. And that's going to look a lot like hash object. So let's start from there. But what it will do is Oh, actually, here's another thing we could do. So let me let me make this a comment this out for now. This thing that actually goes through a file might be a handy convenience thing on object, which is we could actually have. Oh. No. No. We could have from blob from file, which takes a azref path and returns this. That actually does this bit for us. Just because we might end up using the same thing in the tree writer. um let file is file.asref uh this is uh groovebox dark hard is the color theme and so this is actually going to return a uh buff read is fine doesn't really matter Now, the weird thing here is that... Oh, yeah, no. Okay, so the reader here is always one without the header. The reader here is always a raw reader, so that's fine. This only guarantees that it implements read. Which is kind of interesting actually because we may want to do a buffered reader here to boost the runtime performance. But I think we can ignore that from now. So that means this is always going to return a kind blob. So now this should be blob from file of file. And... These now go away, this goes away. This now becomes open blob input file. Sounds pretty nice because now we can have that exact same code be in our right tree. How do you set up line reps in your NeoVim config? Is it on save or do you have a hotkey for that? It's on save. My Rust analyzer is set to do Rust format on save. Okay, so if we now go back to write tree. So our task here is going to be to walk the current directory. And I guess we actually want to walk it. Yeah, I think we need to do this recursively, right? Because anytime you hit a subdirectory, you need to construct a tree object for the subdirectory, return the hash, and that's the thing that goes into the parent. So here we're going to need a sort of write tree for, which is going to take a path and return an anyhow result of a U820. And so this is where we're going to write our recursive thing. There are crates that help you write this. There's one called walkdir. Walkdir is really nice for this. There's also one on top of this called ignore. This is what ripgrep uses, for example, and it knows about things like gitignore files, which might actually be really handy for us here. And I believe that you can tell the walk builder to have a max depth of one. So we could use this one and that way we get things like git ignores for free. For now, let's do it the straightforward way and we'll see if it comes back to bite us. So dir is going to be FS. read dir of path context read directory and I guess we can do this so that it's we could even do open directory this and say path dot display and then we'll do while let sum entry is dir.next. And I think actually this is going to be an option because it is technically possible that you have a directory, for example, with no files in them. And git will just not represent empty trees. They will just be completely skipped from the thing that you create. So I think we want this to be an option. So while entry is this, Oops. Then let entry is entry dot. And then when you walk over things in read dir, then every individual entry might also error. Bad directory entry in. What am I doing with context that goes there? This goes here, footpath.display, like so. And I got to write this correctly. So what we want to do when we create a tree is we want to create that same representation that we've been talking about, right? So we want a string. In fact we want a vec, so the tree object is going to be a vecnew and for each entry the thing we want to write to the tree object is the mode? the name, a literal null character, right? Mode, space, name, literal null character, and then the hash, right? That's ultimately what we want to produce. And because the hash is not a string and not hex encoded, we're actually going to have to do tree object dot push. Well, I guess dot extend. hash. So we need to compute the mode and the name. The mode is going to be match on entry.metadata.context metadata for directory entry. Meta is this. And I guess we'll also grab the path out because we're definitely going to need the path. Like so. Isn't there also entry dot this file, we only really need the file name. We don't actually need the path. File name is entry dot file name. And then I think we want to match on meta.permissions. Well, I guess mute mode is if entry dot... Where's the file type? That's really what I want. Oh. Type is equal to this file type for directory entry. I guess technically... That might be in the meta. Yeah, it is. Okay, great. So the mode, the sort of initial state of the mode is if meta dot is dir, then this is going to be zero, zero. No, just four. They write it without leading characters. Otherwise, it is going to match on meta.permissions. Or what's the... There was some set of rules here somewhere. Recap. Mode, That's not helpful. It was in the previous exercise, I think. Description, can I go back? Yeah, tree object. Tree objects. For files, the valid values are these, these, and these. Okay, so this is going to be else if meta.isSimlink, then it is this string. Else if meta.permissions. Right. That is a thing that's only available on Unix. No. Why does my... oh, have I somehow changed my search engine? That's not what I want. I want rust... that's why my things aren't working anymore. What is it called? Permissions... Yeah, permissions.ext, which is Unix only. So... Yeah, because I think there isn't actually a way to check for executable on Windows, is it? Like there might not be a flag like that. It would be under fs... while ext? No, I thought I metadata ext. File attributes. Yeah, okay. So there might be a way to pull it out through this. I think what we'll do here is we'll go the Unix-only path for right now, which is to grab this guy, which then gives us meta.permissions.mode. And we want to see whether that ended with and what is the bit for perm... How do you even determine whether it is executable? I suppose we could just take the mode, right? But it seems like Git is a little bit more restrictive about which modes it actually allows in there. I think what I'll actually do here is just say... So... This is octal. 1, 1, 1. Not equal to zero. Has at least one executable bit set. I think that's right because the way you construct executable bits is you go from, I think one in any given octal position is the executable bit. because read and write is 6, 6, 4, 4, for example, is read and write for the owner and read only for group and other. So that means the 4 is the read bit, 2 is the write bit, and 1 is the executable. So this is, if we end it with all the 1 bits and it's not equal to 0, that means that at least one, at least one... of the executable bits are set. And so therefore, I think this is what we roughly want. You could imagine that we only do this if if like the owner bit is set or something, but I think this is fine. Great. else if, let me use some link. We don't need the mute. The file name here is an OS string, right? So we can't necessarily write that out like this either. So we need to do this. And then we need to do tree object dot extend file name. And then we do tree object dot push of a null byte. And at this point, like, why even? Why even use write, right? What? No, an OS string is definitely valid as just bytes. Right? Give me OS string. Where is OS string? An OS string better be valid as just a bunch of bytes. Really? Oh, as encoded bytes. Okay, fine. As encoded bytes. Right, and we don't have the hash yet. So the hash is going to be, hash is if meta.isdir. Then, and this is where we get into the recursion. Then it is write tree four. entry.path If it's not, then it is the thing we made for write object which is this from file entry.path I guess that means I might as well just pull it out out here. And this needs to write to so this is also the question of where that goes. And this is where this extra business is needed. Do we want to put that somewhere? No, I'll... I'm going to replicate this for now. It's annoying, but it is what it is. So that's going to do this. And that means we actually also need to hex encode the hash in order to use here in the path that we generate. And then If and this is actually let some hash is equal to this and this is where we end up skipping. So this is empty directory. So don't include in parent. And then that's going to be the hash. Amazing. I think that does it. Right? So this then, the main bit we have left is we need to decide here what to, so this extends the tree object to include all the lines. And then I guess it's if... If tree object is empty, then we return okay of none. So that means there were no entries. Otherwise, we return, and this is where the hash is going to come in. We do this. Object. Object. We're going to have to do the same temporary dance here, aren't we? Yeah, we are. That makes me think that we probably need to make that be more standardized. So this is going to be object to sort of fill in the bits here first. Kind is going to be kind of tree. Expected size is going to be tree object dot len. And reader is just going to be the tree object because vector implements read. unsatisfied trait bounds what doesn't I oh we might need a cursor wrapper I guess it does not let's see look at read maybe maybe it says there it does not okay so then I think it is cursor Yep, cursorVec. So we'll want here a cursor new. Cursor now. I want a cursor now. This one. So this is then going to be okay of hash. Okay of sum of hash. File for tree stream file into tree. stream tree object into tree object file, create subdir of git objects, move tree file into git objects. So clearly this code, we now have repeated a bunch of times. So that's not super nice. And that means we'll go over here and we'll probably make our write thing have to be a little bit smarter. We probably still want write as a sort of convenience method for testing, for example. But let's do a write to objects. And that's going to have all of this logic here. Which is going to do... this. And now, of course, again, we will fix this at some point. It just doesn't matter at the moment. So now this can all become a lot easier because we can say write to objects. Write to objects does not need to take a writer at all. It doesn't need to take mute self. It just takes the object. And then this is just right tree object. And then this becomes okay of sum of that. And similarly, now we can also simplify hash object, which now just becomes... Now just becomes... This one's now a little bit more annoying, but I think actually this one's okay now because we don't even need this anymore. We can just do like this method is this helper is now so small that it can go away. Uh, so this goes away and becomes this, uh, and this, uh, becomes this. write to objects into git objects into blob object. And this is, I guess, really object file. And then this all goes away. And in fact... We can still keep this one. Right? That remains the same. So this is now just where do we write the object. And the hex and code we have to do regardless. So we can say here that this is going to be hex and code of hash. So that makes this a little nicer by going here. And this makes things nicer. That's pretty short and sweet, wouldn't you say? And so now we have write to objects there. It still uses this single thing called temporary, but that I think is okay. This is now just write to objects, outputs the hash, recurses, does all of that stuff. And so in theory now, we should be able to do hash is write tree for path new of dot. Construct root tree object. And that's just gonna make the hash. And then we print the hash. Right? Right? I think that's... Oh, I guess let's some hash. And so this would be like... Anyhow, anyhow. No, anyhow, bail. asked to make tree object for empty tree. And then I suppose we may want to at least hard code that it shouldn't commit the git directory. And so what we'll do up here is if file name is equal to.git then continue. Otherwise this will potentially recurse forever, which is not what we want. I guess the question becomes... Uh... Right, I guess I need to git add dot and then I want to see if I do a git, what's the comparison they give you here? They say, oh, write a tree object. Right tree. I could run this in a different directory, but where's the fun in that? If I do right tree, what does it give me? Okay. And if I run cargo run, right tree. I was about to say, yay, it gives the same hash, but it's just the first character that's the same. So that doesn't really work. Oops. Cursor. Well, it did something. Is there something else that we end up including that they don't? Oh, I wonder if it's the alphabetical ordering. Because we walk these in... we walk these in random order, whatever order readDir here gives. But I think that should be... Actually, we can find this out, right? So we can do ls tree of the object we just constructed. It's a valid tree. At least Git thinks it is. And in fact, all the entries... No, something's not the same. Yeah, it's the ordering. See, in our tree, we print out cargo lock, then readme, then codecrafters. They print out cargo lock, cargo toml, readme, then codecrafters. So I think it really truly is just the ordering. That's pretty promising. So that means we're actually going to have to... Sort all of these first. That's not the end of the world. Although sorting is interesting because do they sort? I guess they have to sort by byte value, right? Because otherwise you get a different ordering on, for example, Windows versus Linux. Or do they actually like understand which? which encoding you're using the file names. I don't think that's reasonable. I think they probably just do like ASCII sort of the file names. Like what does sorted mean, right? Do they tell us down here? That's fine. That's fine. Ignore the git directory. Yeah, that's fine. But what does ordered mean? I don't think there's a sorted reader Yeah So I think the way we're gonna do this is entries is a I guess we could do a B-tree here given we're sorting them anyway. It's a B-tree map. And then we'll do entry and then we'll do entries.insert entry.filename. No, I think I want this to be a vector. Yeah, I'm just going to have this be a vector. And then I'm going to do entries dot push entry. And then I'm going to do entries dot sort unstable by key entry, entry dot file name. And then this is just going to be for entry in entries. Still not the same. So that's ours and that's theirs. They're now in the same order. And the hashes are all the hash for the nested directory is not the same. The hash for source for us is whatever this is. And the hash for them is whatever this is. They order, ordering here is different. So they order shorter strings first. No, we order shorter strings first. They order shorter strings last. Why? Okay. Interesting. So they consider... commands.rs Let me double check that I'm not lying here. So git write tree is the one that's 401. 401 is the one that has 446. 446 is the one that has commands.rs first. So they order commands.rs before commands. Whereas we order commands before commands.rs. No, they don't have directories first. This is the output from Git. And the output from Git does not have the tree first. It has a blob first. So they specifically order differently here. They order with... If two things share a... If one thing is strictly longer than the other, the longer thing gets ordered first, which is not how string comparison works in Rust. Well, that's interesting. So in that case, we're going to have to do... No, it's not files versus directories. This is entirely based on the name. Again, they order the whole line entry. No, they don't order the whole line entry. Then BD would have come before 99. Or rather 99 would have come before this. It is just the name. But that means that our sortBy up here actually needs to be... A and B. And I think... So there are two ways we could do this. Oh, it's probably because they sort including the terminating null byte. Yeah, it's the same thing someone's getting at in chat. So if you sort with the terminating null byte... then null comes before all other characters, and so the null would come first, which is what they do? No, that's the thing. They don't sort that way. We sort that way, right? This first one is ours, and this one has a sort of null at the end here, right? And that one comes before the one that doesn't have a null there, which to me seems reasonable, but in the git one, it's the other way around. So I think it really is. We may actually need to, it may even need to be the stupid. So if afn dot, oh, really? This is an OS string. I'm right. to as encoded bytes, as encoded bytes. In fact, that's an interesting question. So if I go to OS string, what is their implementation of ordering? I would sort of assume that their implementation of ordering is just by the bytes. Yeah, it is. Okay, great. Yeah, you can think of null as having a really high weight for them as opposed to a low weight. And then I think what we want to do is if afn.startsWithBFN, then return... the if afn starts with bfn that means afn is longer so then we want to return afn um right if uh if if bfn starts with afn so Then we want to run return BFN. Otherwise, we want to return A compare B. And this needs to be ordering. So this means that... This means A comes before. This means B comes before. I may have confused myself here. Oh, finally returns an OS string. Can I have it just give me the thing without allocating? Because that's what I want. Really? Entry. Where's my dear entry? Wow, they all need to allocate. All right, that's fine then, I guess. That means this has to be this. And then let AFN is AFN. As encoded bytes. BFN is this. I guess because they're owned we can pull this trick instead. So because they're owned we can do the following. Into encoded bytes. This is at least assuming that's the method. Okay great. So then I can do afn.push. And BFN dot push, and then I can return compare if and to be a fan. Git considers shorter strings to come after longer strings, which effectively means the terminating character. is valued high, not low. Get add, get right tree. That's the same hash. Yeah. Nice. Push will probably need to reallocate. You think so here? Do you think this actually allocates like a... File name, OS string. It's really annoying that they don't expose this method. I guess since we're on Unix, right, there might actually be a Unix version of this. So if we go here, that gives me access to the raw bytes. Der entry ext. No, that's not helpful. Okay, that's fine. Yeah, you might be right, because the file name here is going to get back the OS stir. And then I'm gonna call OS string, which calls the to owned Which calls the 2vec. So what does 2vecin do? Ah, unclean, 2vec. You're probably right that it won't over-allocate. The allocation is sad here. This probably allocates. Which is sad. If your directory contains a dot, get sorts normally, shortest before. What? Really? If your directory contains a dot. Okay, let's do a make their foo dot. Touch foo dot. No, not foo dot. Make their source commands dot. Touch that slash something. Get add dot. Git write tree. Git ls tree this thing. What? What? What the fuck is this ordering, Git? What is this? I know. That makes no sense. Okay. Okay. Go home, you're drunk. It's got to have to do with... I agree with you. I think it has to do with extensions somehow. Like, it's... Maybe it's everything with extensions before anything that doesn't have an extension? But like... What about if there are multiple extensions? Like... Like, what does that get ordered as? Does dot come before... Dot does come before the letters. So that's not necessarily wrong. RS dot? Nope. Only applies one level deep. What? This is just ridiculous. What is this ordering? It's not plain files. I guess we can test if it's plain files, right? Like if I do... Make the source... Source foo. foo.bar And then I touch... foo bar s and foo dot s and I guess I'll do a this as well and I do get add dot and then get right tree and then get LS tree this and foo dot bar dot and this. Now I'm just trying all the things. I just want to understand. Who dot bar dot dot Maybe you are right, maybe it is files first. But it's like in a weird way. Because like if you compare here, these two, so here the non, the one without a dot at the end comes first. But if you compare foo.bar. and foo.bar, that's the same pattern, but they're the one with the dot at the beginning comes first. So I think you're right. I think the type does affect the ordering here. But like, what's the rule? Because it's not as though if you have a file and a tree with the same name, then the file comes first, because they cannot have the same name. So I think it's like ignoring what comes... Okay, here's what I think it does. I think it removes the extension. And then it orders them... By name, then type. I think that's what's going on. I think it removes the extension, orders by name with extension removed, then orders by type. And I guess then by the extension? I think that's what's going on. But but how many extensions do they remove? Like to into there's a path buff from of this, I believe. It compares conflicting directory file entries as equal, but these aren't conflicting. It compares conflicting directory file entries as equal. Okay, but they're not conflicting here. They're different names. Note that while a directory name compares as equal, I'm just reading someone who's looked up the Git source code. I guess, let me pull this up. So, Git source code, GitHub. Git, sure. Go ahead and give me something like probably... Tree, DiffNameCompare is identical to BaseNameCompare except it compares conflicting directory file entries as equal. Note that while a directory name compares as equal to a regular file, they then individually compare differently to a file name that has a dot after the base name. Right, but we don't have conflicting entries. I think I'm going insane. There's got to be something where they do something about this trailing dot. What do they do for isder? Oops. No. No. Isder. But that's comparing the mode. And in fact, this first thing just straight up compares the names. So this is only if they are... Name one... Yeah, and the problem here is the names aren't the same, right? So we should be taking this first branch and all of this logic shouldn't matter. Ah, they take the shorter of the lengths. And they compare up to the shorter of the lengths. Okay, this is very cursed. No! No! Okay. Oops. So, here's what it does. It doesn't look at the extension. But it does look at the mode. So it takes let common len is min of afn.len and bfn.len. So that's step one. Into encoded bytes. And then we do match AFN to common LEN. Compare BFN to common LEN. And if the ordering is equal, then we continue. Otherwise, we return O. Right? Right? If afn.len is equal to bfn.len, then we return ordering equal. and then name at len name and then this bit so this is if len here is the shared length so this is just checking whether We've hit the end of... Oh, it adds a slash at the end. If the thing is a directory for the purposes of comparison. Base name compare. As opposed to d if name compare. Okay, yeah, that's fine. We can use this one instead because we're not handling potential conflicts. So that might make this a little easier. But it still does the same thing. It compares them up to the shortest. Then it looks at what the character should be. And the way it does that is it says C1. I guess that is actually a U8. C1 is... a fn dot get at common length c2 is bfn dot get a common line that's the equivalent of what they're doing there and then it is unwrap or else that's what this thing is here Except it's not an unwrap or else. It is a... We can do it as an unwrap or else. No, we'll do this as a... If let's see. Else if... This is like... A is... der, then that's going to be a slash. Else, it's going to be none. Right? Right? And this is really then if a.metadata, which means we actually have to... extract this in the comparison which is awful so I think we're actually going to extract the file name and the metadata when we do this so we're going to say file name is entry.filename and we're also going to pull out the metadata which we can grab down here and then we're going to push entry and name and meta in here and so this is then going to be a.1 and this is going to be b.1 and this is going to be b.2 as encoded bytes, as encoded bytes. And then this is going to be a.to. So that's the metadata.isdir. Copied. OK. And then c2. It's going to be the same, but for B. And then what do they do? Then they do this. Okay. Okay. And so now they do C1 compare. I think this is just a C1 compare C2. If C1 is less than C2, return minus 1. If C1 is greater than C2, return 1. Otherwise return 0, which is what C1.CompareC2 does. And then now this can be entry, file name, and meta. Because we already extracted them. Okay. WriteTree. CargoRun WriteTree. We still get the same hash. Wait, we get the same hash, so that means we succeeded, including all the weird test cases we just added. Okay, so we did it, right? Yay? Git has very specific rules for how to compare names. That's fine. As long as the thing we got was correct. Celebration? I guess source foo? And also source commands dot rs dot... And dot? Get source commands with the dot. Implement right tree. Push. All tests pass? Congrats? Okay, I guess we can get the fireworks. Weird. Okay, so write a tree object, we actually got there pretty quickly. And then we spent like two or three times as long as doing the write a tree object on directory entry ordering. But hey, we still did it. Okay, git commit tree command. The commit. All right. Let's go to main and add a commit tree. takes a message and a hash. Okay, so commit tree takes a tree hash, which is a string. And it also takes a message which is a string. And here, you know, you could implement things like if dash M isn't passed, then we spawn an editor and everything, but we'll leave that for never. Okay, commands commit tree. Commit tree invoke message and tree hash. Commit tree. All right. So now if we go to this, let's, I suppose, actually read the commit. A commit object contains information like the committer author name and email, timestamp, the tree SHA, and the parent commit SHA, if any. Okay. A commit tree. Ah, dash P for parent. I see. So the idea is that you can create a commit that doesn't have a parent, which would just be the initial commit, or you can commit one that does have a parent. So that means in our main, we actually need a tree hash. And there's also a parent hash, which is actually optional. So that here would be parent hash. And let me go ahead and generate this function. And we're probably going to need most of this. The interesting thing here is I think we can actually just reuse pretty much all of this. So we can just make this be pub crate. And then what do we have to do in commit tree? Well, oh no, the tree is already given to us. So we don't even need to reuse this. The expectation is that we're just given a tree hash. Okay, so in that case, this is then pretty similar to just the bit where we actually construct the bit at the end. So we should be able here to do... object, kind, commit. And we don't actually know what the commit is going to look like, but it is probably going to be formatable, actually. And this will be then commit.len and commit. Right? Right? That hash is this. And then just like down here, I'll just print the hash and be done. This is going to be an anyhow result of nothing. And the question just becomes what do we put in the commit? And I think the bit we put in the commit is. pretty easy because it's just a what's a head. It's just this kind of thing, right? But is that actually encoding of it? Ah, format. Uh... So we have one of the people working on CodeCrafters in the chat. And he was like, well, someone in the chat pointed out, oh, the ordering cases here that we've talked about, like the name reordering, can't possibly be part of the test cases, right? And the CodeCrafters person's like, it wasn't so far, but now it's going to be. Nice. Okay. Okay. Aha. The output is. Okay, content. I think it actually just is ASCII encoded. Alright, well if it's just an ASCII string then this is trivial, right? We can just do tree is the tree hash. Actually, I don't want to do this with format because the parent hash is optional. But we're going to do string new. And then we're going to do write to commit because you can write into strings. It's really nice. We're going to write tree, write line actually. Tree. tree hash. So that's the tree for the commit. And then if let's some parent hash is parent hash, then we will write parent and parent hash. And then we'll write author. And we'll just hard code this one. And we'll set Committer as well. And then we have to write an empty line. It seems to me at least, right? It's an empty line in between here. And then we write line into the commit the message. It's unclear whether there's a new line at the end. 0a. It does look like there's a new line at the end. OK. And then it wants me to use right. Ordering can go away. FS can go away. These two can go away. This is just the message. All right. Isn't there a, I thought there was a way to use right line without it returning result for strings. And that is by using, really? I was so sure. Just the macro. Yeah. Ah, I thought there, I was so sure there was one where you didn't have to do the unwrap, but okay. Um, these, I guess we can write here, uh, all the here, uh, right. We'll never trigger as we're writing into. But we'll add them anyway so that the compiler is happy. Not all? Did we really make it that easy for ourselves? Oh, hang on. 40 character sha. Yeah, 40 characters, so 20 bytes. That's fine. So it's the same kind of hashes, right? If I'm not mistaken. Like this is the same length as this is. Yes. Okay, great. I think that's all, right? We just construct the string. We make the object. And we write that out into the object store. Uh, yep. Right. Commit. Uh, let's do a rev parse head, right. Commit, um, dash P this something. Uh, what? Lies. Oh, it's called commit tree. Is that what I named it? I did I just am blind. Commit tree must give exactly one tree. Yes, I do indeed have to do that, which is fine because that's going to be get right tree. So that's going to be at the end here. Okay. And now, moment of truth yet again. Okay, we produce a different hash. Oh, I mean, we're hardcoding a bunch of things in here, like the timestamp of the commit and stuff. But that's actually to our benefit here, because it means the hashes should be the same. Oh, but, right, but git will introduce a timestamp here. Uh, interesting. Well, here's something fun we can try, though. Let's do a catfile-p of this first. That seems right to me. What if I do a ls... tree of that commit okay um so here's something we can try remember what i just did is basically create a commit of the current directory so if i now do a commit um implement commit tree Then now, so there's now a commit at the head, right? Imagine that I now, for example, try to... What's the way I want to do this? I'm going to create a branch. called foo. I'm going to check out foo and then I'm going to log. I'm going to reset foo to origin master. So this now doesn't have commit tree. And then I'm going to uh get right tree. And then I'm going to cargo run commit tree. Yeah, now get right tree. I'm now going to get no cargo run commit tree of this tree and then I'm going to get And... No. I need the current commit as well. So cargo... This tree that I just wrote up here and then this commit that is the head of foo here Oh, I don't have commit tree here. Fine, fine, fine. Git reset hard master. I just want to create like an empty... I guess I can just do this on master. There's no... I just wanted it on foo so that if I happen to have done something completely wrong, I don't mess up my tree too much in such a way that it's annoying to get back to the place where we just fit commit tree. But we can do this just fine. So write tree, rev parse head, I'm going to commit this tree with this parent and now I'm going to try to reset the foo branch to that commit and then do a git log. Aha! So master was there. foo is the commit that we just created called something. And it has the correct parent. It has the correct master. If I do a git diff master to where we are, there's no difference. If I do a show p of head, then that is an empty commit. Okay, so now let's try to touch or echo hello to world. Git add hello. world write tree but notice that I haven't committed that tree this is going to be important so now the parent is going to be this the message is going to be commit world the tree is going to be this which is the one with the world file added So that gives me this. And if I now reset hard, in fact, I could I could rm world, add dot. So now there's no diff here, and there's no world file. And then I can reset hard the foo branch to the commit that I created with commit tree. And if I now look at git log, I have two commits on foo since master something and commit world. If I do a diff from master onwards, and I have a world filed with hello, and a show p of head gives me that commit. Okay, so we have this is definitely a working commit tree. Amazing. That's really cool. Okay, so now if I go back to master and I push this. I'll test the best. Amazing. Look at us go. Okay. Oh. It's kind of tempting actually to do a. to just like implement git commit as well because we have like now we have all the bits right you just call uh you just call right tree as i just showed maybe we do that just like kind of for fun um so if we go to main and we do uh commit Maybe we don't even do the parentash or the triash. Yeah. Okay. Okay. Commit just takes a message. And. If I go to commit tree, let's extract some of this out so that it's actually just write commit, which returns you the hash of the commit and nothing else. Right, and then this now becomes hash is write commit of message tree hash and parent hash. Write commit. And by having this one be pubcrate, and then we do the same thing in write tree, we take this guy and do this. And now we go back to main. And now we say, okay, tree hash is going to be commands write tree, write tree for path new dot. So you can see now maybe why these commands like write tree, for example, are called plumbing commands in Git. Because they do one thing really well. And then things like commit is just straining together the plumbing commands. And so now we get the tree hash. The parent hash is going to be git head. So here we're going to need a little bit of smarts. So we're going to do fs read to string of dot git slash head. Read head. And I guess technically here, um, We kind of want this to work even if githead doesn't exist, like if you haven't created any commits yet, for example. But let's just ignore that for right now. So assume that there's something in there. Now, it is possible for githead to contain a hash. So if parent hash dot len is equal to 40. No, let's not do that. Let's do if let some ref, git ref is parent hash, strip prefix ref colon, right? That's what's in there. Then it's just in git refs. And I think it's just that same path, right heads master. So we're just going to cat that. Then let then parent hash is anyhow ensure parent hash dot len is equal to 40. And then we can say unknown type of head ref. So here, resolved is going to be FS read to string of format dot get slash, and then just what the ref is for. Right? So if it is, for example, refs heads master, we should be able to just do.git slash and then that path and cat that file, which is what I did here, to get the hash. And so then this becomes here. Head reference target. And I guess what we can do here. with context, because that seems like it might be relevant. I guess actually, that's going to be get ref. And then we should be able to just return resolved here, I believe. Right. We can do that before we start to do the whole constructing of the tree. So now we have the tree hash. And now we should be able to do the commit hash should now be commands commit tree, write commit of the message which we've given, the tree hash, and some of the parent hash that we just computed. Create commit. And write tree for remember can return can return none. So we're going to do this. Bail. Actually, this isn't even a bail. It's just a return. Okay. Right, which is really not committing empty tree. Ah, this is going to be a hex decode. Bad tree hash. And I guess we'll give the context here as well of bad tree hash. Oh, I'm silly. It's a hex encode of the triage. So that gives us the commit hash. And then if we want to be really bold here now is the next thing, right? Which is, which thing do we update? And that's gonna be... That's actually going to be the thing that head points to. It's the thing we're going to update. So we could print the commit task here, but then that wouldn't move us forward. What we really want to do is make head point at the newest thing, at the new commit that we made. And so, and you don't actually want to update head itself because head might be a ref. And if head isn't a ref, then it's a little unclear with what to update because it's pointing at a specific commit. So trying to commit over that, like you can't change that commit. So I think what we do here is actually, I think we're going to require that it is this. So we're actually going to do here, bail, refusing to commit. onto detached head, which is what Git calls them. So this is if you check out a particular commit and not just a branch or some named head, because you can't update a commit. Head is a symbolic ref. In the original Git, it was a symbolic. Oh, interesting. So that means... that here we're going to do head ref. That's what we're going to call that one. And then we're actually going to do else. this so that we actually get that out and then this now can be head ref this no longer needs to be a if let this can be this this can now be a parent hash that because crucially now we can do write to string down here you And this is where it's going to be really interesting to see whether we got this right or not, because this might just completely corrupt my entire git. It's recoverable, though. So we want to write the commit hash into that file. Any reason why you use with context instead of context format? Yeah, so if you do this, then that means that even if there's no error, you're still going to allocate a string and then immediately throw it away. If you do with context, then this string will only be allocated if there is actually an error. That's why I do that differently. Okay, I forget whether git commit prints... Where's the last place where I ran git commit? Right up here somewhere, right? Before I did the last push. Aha! Oh, it just prints this. I'm not going to try to print that summary. That seems complicated. So let's just print line the hash, shall we? Head is now at, maybe? Commit hash, maybe? Ah, good catch. The commit hash needs to be hex encoded. So in theory, this will let me commit what's here. Here. How do I want to do this? I'm going to commit this first. Implement commit. And then I'm going to do something like cargo run commit dash m empty commit. No such file or directory. Read head reference target refs head master. Why? Why? Ah. It's because there is a new line. Head ref is head ref dot trim. And I guess write commit here can just be a stir. And this can be a stir, and this can be a stir. So this can be message, this can be this. Alright, let's try that again. Okay. Get show. Empty commit. Log. Empty commit. Show-p. Empty commit. We have a commit we made! Our commit works. What if I echo hello to world? And then I git add world because we don't have git add. And then I cargo run added world. So it claims 079 git log. That is 079. Added world. Git show. Added world. And there's now a world. And if I... reset hard to head then there's no more world git commit nice just missing author and timestamp sure why not commit to tree. So this one's annoying because you need to parse git config and such. But I think email, name, and timestamp. Here's what we'll do. If let some name, some email is equal to and we'll grab that from stood and var. What am I doing? var os of email. I guess I'll do it this way. It's not perfect, but at least it'll do something kind of interesting, right? Then... Name.IntoString.Context. Uh... Git doesn't technically require... In fact, maybe the thing to do here is a map... of... We could map name through like into string if we really wanted to. So we would map. I don't really want to write that code. Then name and email. And then the right line here becomes name, email. And I think the real Git allows you to do things like pass in what name and email to use for author, but the committer will always be set to your credentials. Oh, it returns the OS string. That's fine. Anyhow, anyhow. Whoops. Oh. Email, email, email. And we can just ignore this and ignore this. And this needs a bracket. I guess this is really a map error. OK, and then the time. is this is just a unix time I think at least that's certainly what it looks like system time so we're gonna do time is time system time now minus time system time unix epic And this is probably the epic... Yeah, so this is going to be UTC. Right? So we'll do here... What? Oh, uh... Isn't there, there is a trick to doing this. Oh I can just do dot duration since. Nope I need to do that here. That's what I want. So that's time. But actually that has to be time.asSex, I think. And durationSense returns a result because current system time is before Unix epic. I think that's going to be okay. I don't think we're going to run into that problem. And then I guess... Here. It's a good question. What does this return? Like, will this actually give me UTC? If I go here. I'm gonna guess that this is basically just... Yeah, it's durationSense but it doesn't actually tell me what the current timezone is. That's fine. Realistically we'd use something fancier here that actually handled timezones. For now I'm gonna pretend that this should be UTC. Ooh, is it encoded as UTC or plus zero zero zero? Or Z? Let's find out. Oh, this might end up being just like a... This is going to be interesting. So if I now do cargo run added world show-p. January 1st, 1970. Okay, I'm going to go with that didn't work. Or at least that didn't do the right thing. So let's do plus... Serum. It just does plus zero. Alright, git add cargo run. Git show 1401 plus zero. That's correct because I'm at plus one. Okay, great. So now we have something that gives It uses my name and email. So now if I tried to do a reset head and then did and. name is, to tie this all together, inspector gadget. And email is inspector at gadget dot biz. Then it is committed by inspector gadget. Nice. And then what I really want is like a Oh. I see. So this is now, now ended up a little bit weird. So we'll, we'll go back to, uh, so now we're at a good place and then we can make this be, um, get add dash. git add and then we can run I still want with the inspector gadget added support for setting name email actual time set name email actual time. And it makes me happy that the commit that lets us set that is written by inspector gadget. Amazing. Good push. I don't, I mean, this will fail at the end because we haven't implemented git clone. Okay. So clone a repository. Given that this is supposed to be the hardest exercise and way harder than the things we've done so far, I'm going to not attempt it four and a half hours into the stream. So this might be a thing that we do in a follow-up or it's a good exercise for you to do. As a sort of follow-up to this, what I will do actually is... Let me see if I can't... New repository. Nope. I'll push this out so that it's available. Public, that's fine. Create repository. Copy. Get remote. Add. GitHub, git push GitHub master. Now, it'll be here. Amazing. I'll put that in chat as well. I'll put it in the video description as well, so it's easy to find. Okay, I think that's actually then where we're going to end it. I don't want to start git add because setting up the staging area is its own different thing because the staging area is like an in-memory representation of your file system that is different from what's on the actual file system that you then need to construct your trees from. And so doing that is its own whole ordeal, fun ordeal, just not one that I want to do right now. Look, it got its own little indicator. That's funny. What do I want to leave this with? So I think trying to implement Git add would be super interesting. I don't think it's something I'm going to do right now, but it's a fun follow-up exercise. Same thing with like actually going through and implementing Git clone. We might do it in another video, or if I don't end up doing that, then this is a good chance for you to do it yourselves. As I mentioned, so there is here, I'll send the link. And again, this is in the video description. There's a referral link where if you sign up through the link, you get access to all of the challenges for like seven days. And so you could in theory like clone my repo and then start just clone a repository and see if you can do it in a week. Or otherwise, you know, you could also actually pay for this. That would certainly make me happy through the referral, but like. whatever floats your boat. And then as I also mentioned, there is a CodeCrafters here and I'll put the link in chat again. And it's also in the video description, which has all of the exercises here in the... in like the raw form, so it doesn't have all the infrastructure for running the tests and getting the fireworks and stuff, but it does have at least the raw bits of the exercise, you can go through it yourself if you can't pay for it and it goes beyond the seven days. There are commits to it recently, one hour ago. I think, I think someone is watching the stream and is making changes as we go. I want to see what these are now. This is the best kind of meta stream. Uh, aha. This is the note that we ran into of the Shah having to be over the uncompressed version. And the SHA is over what includes the header. Nice. That's funny. What else do we have? Ignore the git directory. Yeah, that one's important when creating the entries. Nice. It's the stream talking to itself from the past. I know, right? Amazing. How can I clone the repo if I haven't implemented clone? It's true. It's an infinitely deep problem. I guess you might have to resort to that old git command and then one day you can do it yourself. Okay, I think it's time for me to go eat some food. Thank you all for watching. I hope that was interesting. If you found it super cool, then maybe we'll do more of these. If you were like, okay, I'm done with challenges now, then I'll find something else to do. I have plenty of stream ideas. As I mentioned, if you can... Join the Discord to get announcements for new streams and stuff sort of a bit ahead of time if you're not on Twitter or Mastodon or LinkedIn, which are the other places I post this. So it's discord.johnhu.eu and that gets you to the Discord invite link and there's an announcement channel there where new videos are announced. And if you sponsor me on Patreon or YouTube or ideally Github sponsors because they take the lowest fees, you also get access to a couple of the channels there, like potentially being able to suggest ideas for new streams or even just have a general sort of community chat. OK, thank you and I'll see you later.
Info
Channel: Jon Gjengset
Views: 76,664
Rating: undefined out of 5
Keywords: rust, live-coding, git, vcs, version-control
Id: u0VotuGzD_w
Channel Id: undefined
Length: 269min 27sec (16167 seconds)
Published: Sat Mar 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.