Applying Nom's zero copy parsing with dhat heap profiling | Advent of Code 2021 in Rustlang Day 02

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
day two welcome back to the stream everybody so i'll probably just copy day one into day two that seems to make the most sense here the problem is okay now you need to figure out how to pilot this thing wait we don't know how to pilot this thing we're already in the sub seems like the submarine can take a series of commands like forward one down two up three forward increases the horizontal position by x units down increases the depth by x units up decreases the depth by x units so we just have x and y up and down forward and back hopefully not that since you're on a submarine down and up affect your depth and so they have the opposite result of what you might expect i don't like that at all can you just can you please just tell me down goes down and up goes up i would i would really prefer down going down and up going up already has a planned in course your puzzle input you should probably figure out where it's going forward five down five forward whatever okay cool horizontal position in depth both start at zero the steps above would modify them as follows forward five adds five to your horizontal position total of five down five adds five to your depth resulting in a value of five so we've just flipped the x axis so that y positive y is down it only shows positive values here even though there is up and down i'm not going to i'm not going to lean on the fact that that's all we have positive values input this is our input so we read the string uh input.txt we unwrap because if we can't get input.txt we just want to fail i could make that nicer i'm probably not going to this is probably not something we need to call at the moment so yeah i'm thinking that we want to do uh buffered lines for this later but i think i'm going to do that as an optimization i'm just going to read everything and parse everything to start with because it'll result in a little bit simpler code and i'm a huge believer in writing the simple stuff that works and then when you need to doing performance work taking things from owned to references taking things from many allocations to fewer what do you get if you multiply your final horizontal position by your final depth okay so should we do it like this struct submarine x i32 y i32 so we want integers we're going to do two 32 bit uh signed integers so we can go negative we should just do an enum in um direction forward i-32 up i-32 down i-32 is there a there's no backwards is there there's no leg position i guess position is going to be part two i do have my coffee today also check this out ready wait we've got a little comment highlighter now so if y'all have oh you know what i should have done that for the for the rest question i'm slipping it got me slipping okay so forward five down five forward eight so we're only adding to the horizontal position we don't have like a direction yet like backwards so we'll do that we'll do file.lines i forget a folder reduces the one with the initial state so i'm gonna import the fault here oh i'm gonna input the fault here i'm gonna derive the fault here i forgot how much streaming just like messes with your head like simple things like very basic things are just like what am i doing cargo let's just ron cargo run in part one expect a closure that takes two arguments so this is what the accumulator and the um item we'll call it this has to return the accumulator and then this will just do nothing at the moment so let final position equal file dot lines dot fold submarine default uh default for integers is zero i already know that so we're gonna get a submarine at position zero zero simple default is really nice i really i really appreciate the fact that rust provides a trait that lets you provide um default characteristics to a structure something like that so if you need to construct one and it has valid zero values you can imple default if it doesn't have valid zero values like you can't initialize it without setting things then don't implement it and you can write whatever new function you want okay so there's this let's do mute accumulator item.split can we split on a character i'm assuming we can i should just use dom here we'll do it later we'll do it as an optimization i have like a rock in my sock um item.split uh what a split return is splitting iterator i can probably just figure that out let things equals this turn my inlays on split car i don't really want to collect this into a vac what i'm going to for the moment so we'll do match items 0 this is going to be a little bit awkward because i don't have the enum implemented yet it's fine but i do want to do nom because that'll clean this up a lot i just want to show like the the basic way to do it like the split and um then we can move into something a little bit more complicated that cleans up this code a bunch to act uh should we have a like a function called forward let's just have a function called move well so the thing that i'm thinking about right now is should i do the abstraction because i feel like i'm going to need it or should i not and i'm just not going to do the abstraction and then let magnitude equals items 1 32 so we're using turbofish syntax i'm just gonna assume this works because if it doesn't work we're in pretty big trouble anyway i have so many errors let's do this i'll have to move that in a second but that's fine why is collect yelling at me method not a field oh i didn't call it does plus equals not exist there oh right that's actually um plus equals magnitude pattern okay so if we have anything else we're gonna panic because unhandled input i so if there's anything that's not forward up or down we're just gonna fail because um that means that we did something wrong because this is a structured problem with clean input oh totally forgot that enums could take a values last night yes um the fact that strucks and enums together form like sum and product types is one of my favorite things about rust you can you can stick structs and enums together with data and it's just it's so nice to just have this thing that's like forward plus this amount or forward plus this data and anyway it's one of my favorite things about rust i was panic a format thing yes yes it is i think that there were recent changes to panic to bring it in line in uh rust 2020 with the formatting behavior of the other formatting things but okay so all this uh we need to up let's just do this uh up here because why do we need to do that 50 billion times so up is going to be act.y this is minus minus the magnitude and then down is act.y plus magnitude and then we just return the accumulator final position is not used and then we can what debug out the final position oh because i don't i don't have the bug implemented on this that'll be fun actually i'll implement like display or something on that that'll be fun later um 1998 and 741. oh wait what's the actual thing we need to do impulse submarine finalize is what i'm going to call this function finalize takes a reference to self returns and i32 returns self.x times self.y that i think that's what we're doing right so we can just finalize that now and then allegedly this is our answer huzzah one gold star okay so [Laughter] let's go let's review part one for a second and then we'll move on to part two so what i did here is i come uh built a struct called submarine strub marine has two fields x and y x and y are i32s which are assigned 32-bit integers um integers are nice because we don't have integers in javascript or something like that javascript is all floats which is you know rough we derive default and debug debug is just something that you should probably derive on everything default is something that you have to make a decision about whether you want to derive or not sometimes default doesn't make sense so think about whether you have a default value um but if you do implement default then we get to use submarine colon colon default which is the function for the trait which is really nice we impulse submarine to add some functions to a submarine basically if we are going to like finalize the submarine and do the thing that we need to do to finish the problem we have to have a submarine we have to have the x and the y so i'm just going to input submarine self direct self.y multiple of them enum direction is something that i want to use so i wrote it down here so that i would think about it later where i want to use it is like right in here the way i just wrote this is kind of like simple rust is the way that i would put it it's like much more like javascript or like something like that than you might expect right like we have yes we have default here yes like we have to put mute in front of the accumulator so that we can actually mutate it but this is really just like split collect we have like an array of two things parse one of them into an i32 um if these strings match then do a thing if not then we totally messed up uh and then just return the value like this is this is fairly i would say standard code for a language that doesn't have enums that doesn't have really good parsing support that you might do this with regex in another language instead of doing split but yeah so i don't mean like simple as an easy i don't mean simple as in like um trivial or anything like that i mean simple as in like it's probably what you expect so instead of simple it's probably what you would write if you had no experience with russ but you had experience with like javascript or ruby or something like that um and because we impulled on submarine this final position is is a submarine when we return so you can see that with the type inlay here um and then we finalize it to get the multiplication multiplication so is fold like reducing js rust has fold and reduce so uh the thing that these are on is called an iterator uh iterator is a trait it's the thing that powers for loops and rust it's the thing that powers map and everything else so fold um lets us initialize an accumulator and then gives us like the first value of the second value in the third value and then we have reduce which allows us to treat the accumulator as the first item in the iterator so for fold we get our initialized value and the first item for our first iteration for reduce uh we start with the first and the second item so we have both we have options choose which one works for your case um based on your calculations the plan course so this is part two i think we'll do part two and then we'll go back and we'll optimize part one i think that's the way that i'm gonna work my solutions here based on your calculations the plan of course doesn't seem to make any sense you find the submarine manual and you discover that the process is actually slightly more complicated in addition to horizontal position and depth you'll need to track a third value aim so a theoretical z-axis i guess which also starts at zero the commands also mean something entirely different than you first thought down increases your aim by x units up decreases your aim by x units and forward does two things increases your horizontal position by x units increases your depth by your aim multiplied by x i'm gonna i'm gonna probably trip up on the wording of that but we'll keep going note that since you're on a submarine down and up do the opposite of what you might expect down means aiming in the po why why did why that's still like [Laughter] why do we need that to happen why can't we just match like what we would expect why do we need to randomly flip up and down uh now the above example does something different forward five adds five for a total of five so i'm just gonna start out by copying uh the other stuff that we have over here into part two i do kind of want to keep the dat stuff so i'm gonna copy kind of copy it down here and then move stuff around a little bit you know what i should do if we have enough time later i should do a cargo generate template that has just like dat and part one and part two and a library function in our submarine we're also going to have aim which is probably also going to be an i32 so based on your calculations planned course doesn't seem to make any sense you find the submarine manual and discover the process actually slightly more complicated so down x increases your aim by x units so down increases aim by x units up x decreases your aim by x units so this is these are both operating on aim now uh forward x increases horizontal position by x units and then increases depth by aim multiplied by x so let's start with the increases horizontal position by x units already do that uh increases death depth by your aim multiplied by x so act.aim times magnitude and this is act dot y plus equals that i have a really good idea for doing this later uh when we refactor a little bit let's see should we just run it and see if it works what are the chances that just by making two changes it just works huzzah we're done um we're not done we're gonna refactor but [Laughter] problem one solved problem two solved so it was just a couple of logic uh changes for all of the ups and the downs and the figuring stuff out things let's go back to part one i think what we're gonna do here is we're gonna start using nom and we're gonna parse into enums which i absolutely love doing and then what else are we gonna do here yeah nom to parson to enums which will make this a match on enums we should probably dat this before we do that and see what it impacts again i don't really feel like the runtime is really interesting here so maybe we don't do criterion today and we stick with that so dot allocator standard fs and then the thing here i'm just dropping in the dot stuff so we can now compile with release so cargo build release features that and then i want to target release part one and i'm gonna copy this so i would like to change this like the split feels very funky to me it doesn't feel very rusty it doesn't feel very we're not really handling any errors we kind of defer our errors to like the parsing and then like we don't know here if we actually covered the entire input so i want to try to do that and then the other thing that i want to try to do is do all of this incrementally because we can just read the whole thing down the line and we don't actually have to keep it in memory so i think that's going to have a larger impact on our total interestingly both of these allocate the same amount of heat i guess because we don't really do anything that different between them so let's start with doing um a read line by line in which i forget exactly how to do so rust read file by lines i don't want to use dot lines right um i want to use a buff reader i want to use like a buffered read on the file um so we are looking for read lines and then we consume it we're going to drop this in this is going to be file name we're not going to read to string we are going to pull in file so standard fs file um we are going to build pull in uh i o we're going to unwrap this because we don't have error handling at the moment we are going to not return okay here because that's not something that we're doing um and this overall is the thing that we need to do here and then methodnet found in buff reader file so we didn't pull in a trait is what i'm thinking let's go check the imports here pull this in i don't know why rust analyzer suddenly stopped merging my imports but i find it a little bit awkward a method cannot be called on a result string i or split so our item so when we do read i assume we get the chance to handle every yeah okay so we get the chance to handle every error or every line as a potential error so we are going to completely bail out of that at the moment um i guess that's the wrong thing did i read the message wrong item.split cannot be called on result string i o error uh which makes sense right so if we unwrap the item then we should have a string oh temporary freed while still in use that makes sense um we're just going to call it unwrapped it's going to be spotify unwrapped unwrapped equals this so we're actually allocating more by doing this let's take a look at the do i still have the dat viewer up no i don't dot rust online version that's what we want let me grab the file keep json that didn't work that's not what i meant to do all right so let's check out what's going on here total two blocks allocated at two insignificant too furious okay so six thousand nanoseconds ninety six thousand bytes in three thousand blocks we keep eight thousand in memory i wonder let me check the readme again let's let's look at what we have here we have 7002 blocks so i think allocating that extra string is actually hurting us okay alec allocate you know what i'm actually i think we should start doing criterium for these um because if we change the heap and we don't check the runtime then we can have an effect where we grow the heat but the runtime gets faster this is also not a very large file so i'm not even sure that that matters tgmax was 6000 here i didn't check the other one i should have pulled this up before we did that dropped uh dropped the json violin so this is vac allocation from collect from iterator.rs uh on 33. so at 33 final position i guess fold fold calls collect yeah the speed memory usage trade-off is real yeah for sure part one main closure fold fold calls collect under the hood why are we allocating a vec though i wonder if i write a for loop if this gets smaller what's the example they use here they use a for loop here buff read realign lines iterator folds so total is 8 000 but the max is 10 bytes max is 64 bytes so i guess actually we are we end up using more of the heap total but really what i should have done is checked the json file for the other one before i made these changes so let's let's do that quickly i'm going to do [Music] stash i'm going to stash this part one we will load our json i think our program runtime doubled too so i think actually read to string and the rust compiler already knows what to do here interestingly enough because look at how small this is this is one one giant allocation [Laughter] um for the fold and this one so the revised version is two allocations so this is still 80 000 bytes this is 87 and this is just 80 000. where does the extra 96 000 bytes come from is a question i have like there's 80 000 here and there's 87 83 here right in the first one there's 80 000 here but it says 87 726 okay is it more interesting to y'all to do a nom parser here i think i think we should do the parser um the dat stuff doesn't seem to be leading us in a very interesting direction the changes aren't very big and we can just run this again anyway so we are splitting collect here and i think that getting rid of this collect is probably helpful because we are instantiating a new vec for every line so getting rid of that is probably more worth more time than actually reading this in line by line because i don't think that leading us in line by line is um num because i know nothing about it and dat is still very magicky to me yeah okay let's do nom then there was just a new major not just a new major version but there is a new major version of num so i'm just going to copy the example straight in and we can do for direction i think is what we'll do i'll paste this down here um cargo add nom i don't think we need anything else uh i do want to build this quickly because if we don't then rust analyzer will just complain forever okay so this is the the parser so let's do direction here or move let's call it move i guess maybe i don't know we'll find out we're going to pass back a direction we are going to hmm what are we going to do here we're going to do opt i don't think we need any of these extra helper functions here we're gonna do um alt tag forward tag up tag down and alt needs to be brought in tag needs to be brought in uh we're gonna do bytes complete tag i believe this needs to be a tuple so because these can be different parsers it has to be a heterogeneous type which is what tuples are um so this will give us forward up or down we could write an additional helper function here actually for that specifically um but okay so basically what i'm doing here is i'm taking the input so move is our parser we're taking a string slice um i result is a non-specific result type that gives you a the input type the type that you're parsing into successfully and then it kind of hides the error type from us so we could define our own error type but we aren't going to at the moment but there are just know that there are three things defined in this type there's the input type which is a string slice there is the thing that we are parsing into which is the success return type and then there is the error type this is not what we need this is not what we need we need it's probably an i-32 in um so let's go look for that now let's just do i-32 and then we're gonna have to bring this in and it's not going to be from standard it's going to be from nom um magnitude i think we have to pull in i result too we don't have eye result in yet well adding nom add more to the heap stuff you were looking at earlier it should reduce it um if we write it correctly what are all in the what are the alt and tag functions um so nom is a parser combinator library this means that uh what we have is a bunch of functions that we can combine to build a larger parser so in this case when we say tag here switch over when we say tag here we're looking for the string forward in the input um and what alt does is it says there can be one of these so it can be forward it can be up or it can be down and alt takes basically a selection of potential parser combinators so in this case it's just one of three tags um but it could be any number of things and then we pass the input so this constructs for us another parser combinator basically it is the parser combinator that will parse one of tag forward or tag upward or down and we apply that to the input so we pass the input in as a function as an argument and this will either succeed or fail which we're handling with the question mark the question mark will return through this i result type um it's the same kind of thing that you would expect in any other question mark usage um in which case we also have the input and some stuff we're ignoring actually we're ignoring that right now right we'll call that dur to pull out so what we get back from these parts recombinators is the rest of the input and the value we parsed out so we take each time we're doing this we're shadowing the input variable it's just kind of a convention to do this because you have to write it a bunch so this is the full input right at the top we pass the full input to this parser combinator this parser combinator gives us the rest of the input and whatever it's successfully parsed out which we can then use to do something like this so we're basically it's going to be forward space whatever right so i'm going to try to tag a space and i'm going to pass in the rest of the input and that will give us the rest of the rest of the input and whatever it parsed out in this case is a space so we're not actually going to use it which we can then pass the rest of the rest of the input into our i32 parser which will give us the rest of the rest of the rest of the input uh and then the thing it parsed out so all all said and done here what we have here now is the dir which is the the direction forward up down or and our end the magnitude so if we match under and we get like forward up or down so you can see that this is basically the exact same stuff that we wrote up here forward up down but in this case we're going to do direction forward magnitude and instead of doing this down here let's do let let's call this result let's return result here so you'll see that the the function signature for all of these parts or combinators is rest of input thing right so when we write our parser combinator because move is yet another one it's just one that composes like all of the others uh we have to return the rest of the input and the result and then we could uh we could do a number of things here nom locate and nom supreme are also very interesting crates i think we'll cover those later um nom supreme is really interesting it allows you to keep it'll so nam nom supreme allows you to keep everything that happens and errors from the entire tree of parsing um so it gives you a lot more information and then nom locate gives you like the point uh in the document at which you have got an error or gotta value which they call spam so both very interesting cool things but i think we're going to keep it simple today i think we're just going to unwrap this or whatever um not unwrap if i was trying to write production level code here i would make sure that this came back as an error an error that i could handle why is move complaining oh cause move is a keyword right uh we can't use move um what do we call this i'm just gonna call it parts direction it's kind of not how everybody like you don't see this as being parse tag or this is being parse alt it's just tag and alt so people usually go that way but we can do parse direction so in here we can do parse direction item and i'm just going to unwrap this we're going to throw away the input and just take the direction so now we know we have a direction which gives us all of the magnitudes and everything yeah um nom is super powerful i think it's far more understandable than doing parsing in other ways like if you start looking at asts and you start looking at like how people are like hey this is how we do a compiler and whatnot you'll start hearing about like lex and yak and like all of that kind of stuff and that gets like super i don't know it's like super in-depth and confusing and like not where i would suggest people start with parsing in asts um parser combinators to me they are functions that return like things you can compose right so like when you have a function that parses something it returns you a value and you can compose that with other functions to parse more complicated values so for example the nom docs start off with a hex color basically right like like hash ffa bb kind of thing um and there's a tag and you compose tag and you compose this hex primary function and these are all just functions that take string slices and return like whether they successfully parse the value that they wanted to or not so you get to write like regular functions and you get to compose those functions together which is basically what we do in programming all the time anyway and that gives you these like really high level parsers that give you rich structs and things like that um so we can match der here now and this will be forward magnitude up magnitude okay so we've got direction forward now direction up and direction down and this is not something we need so that cleans up our code quite a bit because now we don't have to do this split thing up here we don't have to do any of the other stuff let's build this and run it and there we go so one four eight zero five one eight what was our answer for part one one four eight zero five one eight so we're good so not only are we good but we brought down our total usage to seven thousand bytes like like we were doing that before right we were looking at 80 000 bytes and we were like okay maybe we maybe if we do it like line by line or something like that but that didn't do much for us because the rest compiler was already smart enough to realize what we were doing and basically do a better job than we were able to but if we bring up the viewer for our new iteration here and we look at the heap we see that this is not only taking much shorter so it's taking 161 nanoseconds whereas the other two took what um six thousand and three thousand which to be fair and be perfectly transparent that is not a runtime execution profiling tool it's not we would be using criterion for that we should be using something else a different tool for that um so it's it's really just like context right it's not hey this runs faster or slower it's just context but we can see now that we have a total of 7 000 bytes so we got rid of the allocations for split and that seemed to be the big problem for us in terms of allocations anyway so we've got the 12 bytes for opening the file right for the read string stuff and then we've got 7 000 bytes for actually doing all of the work actually is that true 33 and 33 we don't allocate at all actually so um nom makes it super easy if you're using the right types uh to actually we we didn't allocate nom doesn't allocate anything here it's uh what is going to be the term for this is it zero copy it's zero yeah so i didn't mention this before but the description of what nom actually is is a parser combination library with a focus on safe parsing streaming patterns and as much as possible zero copy and you can see that here a thousand bytes off with knob no um 70 000 bytes off with them or 90 if you want to look at this so this is 87726 and this is 7726 so 80 000 bytes it doesn't allocate anything because it uses references nikki like the types we're using here the string slices we're passing around references to things that are already allocated we don't do any allocation here right so it's not like we're creating a capital s string allocating more on the heap returning this extra stuff like we're passing in a string slice we're returning a string slice and the only thing that we're doing here is pointing to references in pieces of that string yeah nom nom is super impressive piece of software um let me copy paste this into the readme part one with nom so what else do we have here really the only thing oh that's part two really the only thing is like instead of matching here we could move this onto a function on the submarine so instead of having this like this we could have sub dot we can't call it move um sub dot what's another word for move somebody give me another word for move swim our submarine swims that's impressive this is going to yell at me because i'm taking a mutable reference oh let's just make this unit we actually don't need to return this because it's muting and this is sub and then we return it so now look at this right this still does the same thing build release part 1 7726 we have changed nothing effectively so um now all of our code is encapsulated into these like five lines right or like the thing that we're doing is encapsulated into these five lines we take the lines we fold over the submarine let's get rid of the inlays here looks a little cleaner um we take the sub we take the line i guess we should call this line and not um we parse the direction out of the line if we get one we swim there and then we have the sub but yeah so like we just move this we're moving the direction in here we do nothing additionally on the heap so yeah i mean like nom and rust and trade implementations and things like that allow us to do a lot of like cleaner work here and then like so this is all basically the same stuff right so let's take let's do this let's let's go parse direction we'll pull it out we'll put into this library right um i'm going to copy and paste all of the used stuff we don't need fs here we don't need that here the submarine changes so i'm going to leave it here the direction does not change so i'm going to pull it into the library right and we need to make the use pub and then this is use what do i call this what's the what did i call the day to submarine now day two direction it's not great because the crate is the bin um and you think i need to define the library day two direction what did i do here so let's name day one oh no no no no horse direction we don't need nom here anymore they do need that and whatever so we've got our sub definition and what the sub should do in response to the things we've got the main stuff uh this could probably be pulled into okay so if we look in here we can pull in the same stuff now that we have an actual so parts direction and direction we'll get rid of direction here because it's the same thing parts direction um we need to redefine the sub so if we take this right and we have this submarine here and we add the swim but we move our logic back in so we're again we're changing the implementation of the submarine but we're using the same direction and the same parsing mechanics oh that's not sub itself and then i'll go down here and i'll take this final sub stuff and i can just paste this right in here because it's the same exact thing and now we've got this function which we could use right we could define this in our library and just say pass us a string of new lines and we'll deal with it but i decided not to in this case um the only changes here are the magnitude so if we build this and we run part two we get seven thousand seven hundred and twenty six bytes so where's the readme so they're the same but it is nice to uh confirm that they are indeed the same so yeah um i hope that i've stoked your interest in nam cause nom is absolutely freaking amazing um it's reasonably easy to use i will say i will caveat that with if you're trying to fully enumerate your errors you are going to have to put more work into learning what nom does with those errors you do have to map your result type into one that can be used in this eye result this i result is actually a three item type so you again the string slice for the input the output type and the error type the error type is not shown on screen here you can put that in there but that's the only caveat that i will have um i really love nom nom is really great uh it makes it really easy to write really performant really low allocation like parsers um i have an mdx implementation using nom that is oh dang it runs in nanoseconds yeah i don't know i just i absolutely love the way that rust comes together in uh the sum and the product types which allows us to do these like structured operations right like we know that we have one of these three now we don't have to just be like oh is there like an extra string or did i misspell this like no like we parse it it either parsed correctly or it didn't and then we have from that point on just these enums with the data that they need and then nom is just like mind-blowingly awesome when it comes to actually parsing stuff um so yeah
Info
Channel: chris biscardi
Views: 151
Rating: undefined out of 5
Keywords:
Id: XE5mOPZ9xRU
Channel Id: undefined
Length: 38min 28sec (2308 seconds)
Published: Fri Dec 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.