This is Better than C for Binary Files

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
looks like we're live hello everyone and welcome to yet another recreational programming session with ausen uh forgive me my track suit I'm I'm just Russian so it's a little bit cold inside so I had to just put something that is a little bit more warm than than than a t-shirt so there's also some problem with my windows so it's a little bit cold inside so outside we can actually check out how much it is outside NOA cyers um so minus 28 it's actually a little bit uh warmer than it was yesterday I think yesterday it was almost minus minus 40 so it's getting warmer it's getting warmer so which is actually quite nice uh which is actually quite nice so let's make a little bit of announcement uh red circle live on Twitch and the question is what are we doing today on Twitch television website today we are hacking earling that's right uh I'm going to give the link to where we do know that twitch.tv/ soding and I'm going to Pink everyone who's interested in being pinked and there go the stream has officially started the stream has officially started so today uh we are hacking Arling that's right does anybody know this beautiful beautiful programming language um so yeah it's a programming language for distributed systems right so for concurrent distributed systems and it's a functional programming language right and it has a prologue like syntax uh right so I've been requested to do something related to to earling for quite some time uh specifically in the form of Elixir right so the the younger brother of erling is elixir right Elixir um is that I think bin I don't know what what the is bin but let's actually Google up Elixir so here's the uh website to llink right so I'm going to copy paste it in the chat and for people on YouTube it's going to be in the description so an Elixir is uh this thing hopefully yeah okay so it's under https so it's basically a younger brother of erling and uh the way they are related they are related through the same virtual machine because erling is um basically a managed programming language it has its own virtual machine kind of similar to Java but the virtual machine of earling is different right so and basically Elixir compiles to the same virtual machine right and by the way it is the virtual machine I'm interested in today rather than a specific language I want to explore the virtual machine of erling specifically of lling specifically so here's Elixir like I don't know how different they are so for me uh Elixir just looks like a ruby clone that compiles to beam and beam is the name of the virtual machine of erling right it's called beam and I think it is spelled uh like that right so that's probably why I'm not really that interested in Elixir right for me it's just like a a different thing that compiles to beam right so essentially Elixir is a it's a cotlin for uh for earling right so the original language was Java right so and then you have cotlin that which compiles to the same virtual machine so it's kind of a similar situation with Elixir original language was earling right and elixir is just like a different language that compells to the same virtual machine so um yeah let's actually put Elixir here uh also in the description in the references uh references so specifically what's going to be my goal today I want to generate some beam files right I want to generate some beam files and I want to try to parse them that's right I want to try to parse and understand the format of beam and maybe even find the sections where the bite code is contained and try to understand what is going on in that bite code furthermore if we have enough time and everything uh we can try to generate that bite code right so ideally but I'm not going to claim that I'm going to do that on today's stream but ideally I would like to have a small like a stack based programming language like a por like thingy uh that compiles down to earling virtual machine to the bite code of the earling virtual machine that would have been kind of cool right nothing super fancy right not the entirety of the port but just an ability to do something like uh you know 34 35 Plus print and then take this string and compile it to Beam file and then run it within the virtual machine of lling that would have been kind of cool that is something that I aspire to do today but I'm not uh really sure if I'll be able to do that right so maybe the the formal overing virtual machine is so complicated that we won't be able to do that uh but I have a couple of interesting uh things that may help help us to to understand that format a little bit better right so if I if I know if I knew for sure that I won't be able to do that within the stream I would not probably start this stream in the first place right so that means I know a little bit about like how we're going to go about it so maybe the the probability is rather high right otherwise I wouldn't even start streaming uh anyways so let's actually go ahead and try to write something in earling right so let's try to write something in earling and so I'm going to put something in here so I don't know earling earling hello right so here is going to be this thing and uh so essentially we're going to create a file main ARL if I'm not mistaken the main uh extension for earling is erl right so here is erl and emac actually recognize it correctly it's it is in fact earling so if I'm not mistaken right if I'm not mistaken what you're supposed to do you're supposed to create like a function right and here's the interesting thing um in in terms of entry point in terms of entry point into the earling program earling is kind of closer to python rather it is close to rust cc++ or any other compiled languages right in the sense that you just start up the environment you just load something into the environment and you just run that thing in the environment and as far as if I understand correctly there's no really standardized entry point there's no really standardized entry point like in like just like in Python right so and if we have any professional uh earling developers in the chat please correct me please let me know I might be wrong because I'm not an earling developer like at all so uh okay so we're going to create a function and uh so this is how we Define the body of the function right here is the name of the function here is the basically parenthesis where you put arguments of the function right and then you put arrow and you start actually writing the body of the function and the body of the function how do you write it you write actions right so you write rather Expressions that may have side effects right so you may have expression one then you write the next expression like separated by comma so this is the second expression this is the third expression and so on and so forth and you end your list of expressions with DOT so how uh earling evaluates your uh function it evaluates an expression one performing any side effects that may happen within expression one then it proceeds to evaluate an expression two three and four and the result of the expression four becomes the result of the code to the function so it is an expression based language just like erling or a camel or HK or anything like that right so that's basically the syntax of earling essentially uh it's kind of goofy it's kind of silly but this is because it's based on the syntax of prologue it's it's really close to prologue and as far as I know when people were developing lling the first version of lling was essentially a library for prologue then they realized that the prologue is not really good language for this kind of stuff right so and they implemented their own language but they kind of preserved the syntax because they kind of got used to it right so that's why it kind of looks like a prologue in prologue by the way you write sort of clause like this right so you can have some arguments in a close and then you put this kind of like this kind of iterator and within the iterator you kind of also put expressions in here like that so it's kind of similar to prol and you also End by the way with a DOT uh right so it kind of mimics the Prue syntax if I'm not mistaken okay so if you want to write hello world I don't quite remember how to um write hello world or how to print things in lling is there something like print uh all right so let me find earling hello world there should be something like to print uh print things okay getting started getting fored um uh I right okay so you're supposed to do I F right all right I F right uh and then so as you can see there are some sort of like a name spacing in here uh right it has uh names oh by the way are your format I kind of vaguely remember that yeah that's right I your format thank you thank you so let's do format instead uh it's going to be hello hello world and there you go so you put a dot in here so if I understand correctly uh emx has uh sort of like a earling shell you can say earling shell there we go so you can have a earling shell within emx how about that so it just I suppose it just runs an external erl right I'm pretty sure that's that's what it does does uh right so just runs external erl and lets you put this kind of stuff within within the thing so the way you run the program the first thing you have to do you have to compile that specific module like this uh right and we cannot compile this thing because there's no module definition Yeah so basically first first thing you need to do you need to say within the file that you're defining module with a certain name right so we're going to define a module with the same name as the program right so it's going to main so the file itself is called Main and we're going to say the module is also called Main and on top of that you have to expert the Leist of functions that you want to you know make visible outside of that specific module right so uh let's actually do expert right and I'm going to say okay here's the list and I have to say hello and here comes an interesting thing so the way you describe signatures in earling is kind of similar to how you do that in prologue right if you never programmed in prologue it probably doesn't tell you anything but it's a dynamic language so that means it's like you do not assign types of arguments right but uh you can overload functions by the amount of arguments so and essentially to spec to say that okay I'm referring to function that has zero arguments you put slash zero and that means you're referring to the function hello that doesn't have any arguments you can have another version of hello that accepts one argument and if you were referring to that function specifically you would say something like hello SL1 so these are two different functions right so the format of referring to the signature of the function is name slash arity name/ arity and if you don't know what is the arity arity is a mathematical term which means the amount of arguments in a function right function f ofx is the function with aity uh with AR zero function f ofx of Y is a function with ar I'm sorry one and FX of Y is a um function with arity ter right so you describe a name slash arity so that's what you do essentially and that's why I put sl0 in there because I indicated the AR uh number of arguments in math yes so as far as I know arity is a mathematical term that uh basically leaked into programming right so quite often uh the developers of the languages of programming languages use the word arity to describe the amount of arguments of a particular function because that's how they do in um you know in in math so uh all right all right all right so and essentially we defined the function and let's try to compile this thing one more time and as you can see it in fact compiled right so then we can actually call the fun function hello like this right and as you can see it says hello world so we basically compiled uh the earling file right and by compiling it uh we um created a module beam module and loaded it into the repel and now we're able to to run this entire thing so as far as I know you can modify this entire thing like F Bar and you can just go ahead and recompile it uh right and then you can run main hello right and it's automatically recompiled as well so and one of the sort of like a killer feature of this language is that you can reload modules on the fly without restarting the whole system right these days you can't really surprise anybody with this kind of feature because we had Dynamic languages before but erling is a very old language it is very old language how old it is actually lling uh let's take a look at to Wikipedia so at the day at the time when it was developed it was like a revolutionary idea of having language that you canot reload apparently so it's 37 years it's older than python by the way it is older than python as far as I know first maybe python I don't quite remember but I think python is around like 90 or something like it's it's even older than me so yeah it is even older than me it's not as old as C but it's still relatively old language it is still relatively old language so yes so yes so yes but here's interesting thing after we did that c uh name of the module look what happened it created main. beam and as already mentioned beam is the name of the virtual machine of earling so we already can try to analyze the bite code of beam because it's literally in front of us near the source code of the main program of the the hello world program that we just compiled and look at the size of this thing 660 it's not even one kilobyte it's less than kilobyte so there's not not that much stuff in there to analyze so and that thing is already capable of producing hello world which is pretty cool I think right so um so the the task of maybe creating a small compiler into the Beam file is not impossible right if this is how much data you need to generate to do hello world right it's not that much uh we can go ahead and just try to open it this entire thing and that already looks like something like some sort of a format uh we can already see the magic number right so the the magic for bytes uh so so the the encoding is probably horrible but I wonder if I can make it better uh maybe not I'm not sure if it made it better but I tried right so the first four bytes is 41 and this is probably the magic number for the the bite Cod file of beam right so another interesting thing we may notice is that after these four bytes for one we have some non asky for other bytes a non asky for other bytes so and we can take a look at them so here they are the leading bytes are zero the leading bytes are zero and only the uh lower bites in here they have something in there so which probably means it's some sort of a size it is a very common sort of pattern for indicating some sort of a size within the binary file format it would be nice to know what this specific size is equal to what is this specific size is equal to so we can try to do that so uh we could do python but in Python it's not interesting right so what if we can do that in earling we already have an earling repel we already have an earling repel maybe we can you know do all the calculations there so earling uh hex literals how do you do Hex literals in earling so this is a very interesting question or a version number by the way that could be a version number this is also quite quite a common thing right it's also quite a common thing um so data types uh hex uh I don't really see so there's only hex maybe there's some examples in here um so yeah okay so this 65 no this is probably character or something oh okay so oh this is actually kind of cool so you can put a base like this then hash and then the value so I can do something like this uh right so FF uh and yeah there we go so that's actually pretty cool can I have something like this yeah I can even have like a very exotic basis like 15 that's actually pretty cool right so this is how you do this L and this is actually kind of cool look in the majority of the languages uh you can't start the uh number literals um you can't start the variable identifiers with a digit because because uh something that starts with digits is usually some sort of a number and it kind of follow this convention right so the base comes first and it is a number so you can't really confuse that with the identifier or anything like that so and it's actually super explicit I really like that I really like this syntax I like it more than 0x uh right or maybe even z b or and stuff like that it's a little bit more explicit but anyway so uh let's actually take this entire thing uh let's actually take this entire thing and I'm going to copy paste it in here and I'm going to say okay let's interpret it as this stuff so uh what is this what is this uh can I uh that is bizar why doesn't it why doesn't it work if I do something like C oh yeah okay I'm an idiot I'm supposed to put a dot at the end of the expression that's why it couldn't okay 652 652 the size of the file is 660 if we subtract this thing we get eight we get eight which is literally the size of the whole file except the magic and these four bytes that denote the size of the file so it's basically the amount of bytes until the end of the file we already figured out first four bytes of the format without having any specifications we already figured it out so so see we don't have any specification I don't even know where specification is right so but we already figured it out so uh not bad so the next four bytes is another magic right so uh which probably means that this entire file consists of different chunks right and that sort of magic indicates the type of some sort of chunks right and maybe those chunks are like uh the chunk of data or the chunk of code and stuff like that and we can see uh asky data within the file format we can clearly see all of that so there's definitely some ask data there's definitely some binary data and it's all very very much clear so uh the only thing we need to do we need to just understand what the hell is going on with this file right so um I think we can go uh much further without proper specification right so we need to download the proper specification of this file format somewhere right um so that's what we need to do to proceed how do we know if it's not some constant and by chance it relates to the size of the file uh the probability of that is extremely low okay good so um here's the thing um there is a very interesting uh piece of documentation online uh and it's called The Beam book uh right it's called The Bean book so let's try to find out uh and it's basically a description of the earling runtime system erts and the virtual machine beam and it's kept relatively up to date so the last modification 7 seven months ago right and uh there is an online version right so it's a it's literally book actually actually it's an online book um so you can find it in here I hope I can open this entire thing uh okay so there we go I'm not sure if there is like a dark uh version of this entire thing but you can find uh this thing in here and for people who's watching on YouTube potentially I'm going to put it in here so the beam book so this is the beam book and there is a very interesting chapter within the bean book Bean book modules and Beam file format sounds like exactly something we might be interested in doesn't it let's find out so and this is actually relatively big book uh right and it goes into the internals of how earling works and stuff like like that so it's pretty hardcore uh I'll tell you that so but the uh the section that we're interested in is modules and Beam file format so and there you go The Beam file format so the definition source of information about The Beam file format is obviously the source code of beam li. erl uh okay so we can probably maybe get some stuff from here uh I wonder if this thing is even available for the um from the re maybe we can just load beam okay it's a part of the standard library of OTP or whatever the it is uh that means that we should be able to maybe call to some of these things but there's a lot of stuff in here right there's a lot of stuff in here uh the build file format is based on the interchangeable file format uh with two small changes we'll get to those shortly okay whatever so beam uses the type beam a Beam file header has the following layout that doesn't it look familiar so we have some sort of a interchangeable format header if header so it's a 41 then size size and then beam again so it's also big engine right um so yeah and it uses like a very interesting format so it's something from um from earling itself right so these kind of things in earling they indicate some sort of like a bite array if I'm not mistaken so earling is actually capable quite capable of parsing binary formats it has syntax and Facilities to parse binary formats so we can actually Google that earling um bite arrays I think that's how they called does anybody know uh right does anybody remember so there's bite rates uh or something like that so there are array maybe beat arrays uh so here are arrays but the thing with the um beat strings yeah I think it's bit strings thank you so much uh X not uh famous thank you thank you thank you so let's actually Google up beit strings so because I think that feature that is used in the book is a uh bit strings strings I think that's what yeah there we go so it uses this uh you know triangular brackets it uses this triangular brackets so we can read a little bit more about that I never really used that feature but since the the book is using it so we probably need to be able to know uh this kind of thing so uh the complete specification of the bit syntax is a reference manual in earling Ain so they call it Ain is used for constructing binaries and matching binary Parn imagine language having facilities to match binary patterns so a bean is written in the following syntax so the Triangular brackets expression one expression two separated by commas uh and yeah so you have n expressions in here a bin is a lowlevel sequence of bits or bytes the purpose of Bin is to enable construction of binaries uh right so you can construct a new binary like that um all elements must be bound or much binary so you can then if you have some sort of a binary you can match it and sort of Barse it so that's pretty cool isn't it so here bin is bound and elements are bound and bound yeah examples so uh a binary can be constructed from set of constants or string literals so let's actually see how we can do that um so uh we can do something like 69 uh 420 is probably not going to work because it's more than a bite but it kind of worked out right it kind of worked out anyway so if I do just for 420 so yeah so this is asky but then what is exactly for20 so that's really interesting so these things should have actually turned into several bytes then right into like two bytes uh one of the things we can do we can actually try to print F this entire thing and take a look at xxd uh and in fact it's actually three bytes all right it's actually three bytes so if I put something in here that is bigger than 255 it's going to be basically two bytes right so here I defined three bytes not two very interesting very interesting actually so and in here I can also use like strings and whatnot right so uh if I do something like ABC uh there we go uh what's fun is that what if I put like utf8 in here right cilic for instance that's kind of funny like why did it turn out to be like that uh maybe it doesn't really fully understand this R but yeah okay uh right so and you can convert like binary to list and list to Binary and stuff stuff like that um so similarly a binary can be constructed from a set of bound variables right so you have a b and c and then you provide like a colon in here this give a binary size four uh right so I suppose you specify like amount of Beats that should be taken by that specific variable if I'm not mistaken and if we take a look at this thing okay so this one is interesting so this starts to make sense now this starts to make sense right so this is probably the name of the variable this is the size uh the amount of um I don't know why it is it four right if we specify the amount of Beats it has to be 32 right it's 4/ Unit 8 uh I don't know why it is like that but yeah there we go so that that defines somehow four bytes and what is big oh big it's probably indianus yeah yeah yeah yeah yeah so they say it's a big endius you can also specify the endius if you want to match it that is actually very powerful language for parsing binary formats right so you can say okay so here 32 bits uh binded to this specific variable binded to this specific variables and interpret it as a big endian like that um what the so this is actually quite surprising right uh this is actually quite surprising so and this language was developed for um Telecom applications right uh by AT&T Ericson it was developed by Ericson for Telecom applications um right and I suppose a lot of formats that are used in there they are binary formats so that's probably why uh it has all of these powerful facilities to to match all of these things um okay I do understand this things so this is basically uh the the size of this variable in bits and then this is the Nan right but uh if we put unit in the place of Nan what is that supposed to mean and why can you actually say eight in here like what is this additional thing this like I do not fully understand that right I do not fully understand that it's kind of it's kind of weird uh so let me let me see so maybe there is additional things okay so these are like the sizes actually a lot look at that so holy look at that you basically receive some sort of a data datagram right from UDP or something like that and then you can do case pattern magic and you can pattern match it against this huge pattern and you can extract the IP version a l or something like that and because it's within the case if it doesn't match you can try to match it against something else um yo what the and this is the language from 80s oh well uh this is a good idea actually uh yeah 8 by uh 4 by 8 is 32 so that's basically what's going on here right so you have eight uh nibles right so that is that is a really bizarre way of describing that but I mean maybe maybe that's exactly what you want to do in here so it expresses some sort of an intent that I do not fully understand but that's fine okay uh so lexical note uh segments okay so we have value size type specify IED list whatever that is type specified list um okay so this is very interesting they even freaking they even have this example in here uh right so you have X for/ little signed integer or unit it's it's like one of those or maybe it's like a list of the things that you can have in here it's a little endian signed integer and unit and unit is the the unit size is given as unit integer literally the allowed range is from 100 it is multiplied by the size specifi to give the effective size of the segment ahuh so essentially you can provide the second size only if the type list contains unit right so it is the unit and you have eight units in here uh all right so that's the element has total size beats and it contains a signed integer little endian order okay so that is needed to express some sort of an intent in here um right it's needed maybe the actual unit here is eight right the actual unit here is eight bytes and you have four units with the size eight okay I finally understand what this means I finally understand what they what they were trying to say you have four units of the size eight right and unit is basically the character now I fully understand this pattern now I fully understand this pattern this is actually quite cool I'm yeah that that's super cool actually so what the uh after the header multiple chunks can be found after the header multiple chunks can be found there we go that's exactly what I was talking about right so it feels like maybe it consists of several chunks and stuff like that right several chunks and stuff like that and the chunks have different types code atom string T lead T and so on and so forth there we go so and it's usually four characters as you can see so we can even take a look at what we have in there right we can take a look at what we have in there uh yeah so we have four one the size of the file and beam and then we have at u8 I have no idea what is at u8 and it don't they don't really mention at u8 in here right so it's somewhere in a dot dot dot right it's somewhere in a dot dot dot uh this file format prepend all areas with the size uh of the following area making it easy to parse the file directly while reading it from disk yeah so this is actually quite cool so it makes it super easy you know precisely the size of the header so you can read only the header right you can read only the header and then you can read the rest of the things you can read the rest of the thing or you can read the fixed size of the chunk the fixed size of the chunk and uh then you know how much more you have to read after that and what kind of chunks you have in here and that makes it actually super easy to skip the chunk right so you read the chunk header you uh see the by the type of the chunk that this is not something you're interested in this is not something you're interested in so you know exactly how much you have to skip because it's in the size of the chunk so you know exactly how much you have to skip so it's just like whatever um so you can basically like sort of stream read the format uh or maybe like if if the file is so small right so it's it's not that big of a file you can read it um the whole into the memory and then skip it but maybe at the time when this thing was developed right so in 80s the amount of memory was actually limited right so the amount of memory was actually limited and because of that they need they would not read the whole file into the memory right so maybe they will just like read it like by chunks and Skip some of the chunks to not to waste too much memory not enough memory to read the whole file crazy I mean what was the memory sizes in 80s so it's the language from uh 1986 what were the uh RAM sizes back then uh I don't know like I was not born back then so I was born in '90s not in 80s and uh to be fair in '90s I didn't even know what is a computer I only got computer in zeros 4 megabytes uh 640 kiloby wor in the future I think it's too much for for 80s right but I don't know maybe anyway um 640 kiloby is enough for everyone exactly um so the file format prens all the error with the size the F making it easy to part the file directly while reading from dis yeah to illustrate the structure and the content of the beam files will write a small program that uh that extracts all the chunks from The Beam file so you can earling is capable of Pars in itself um all right for memory uh for memory May figures 82 to 84 1 kilobyte so it's around um well I mean that's actually pretty good I think you can Feit relatively big programs in memory um right but I'm not 100% sure so obviously um 600 bytes it's basically hello world the programs that are written in earling they especially the ones that are used in Telecom applications they're probably very complicated right they could be quite big right so uh I don't know it's kind of difficult to speculate in that so anyway uh okay apparently reading a file from uh like in earling is as easy as this so you can just read it okay so and as you can see like this is Tuple in curly braces in curly braces you denote a tuple kind of similar to how you denote the beit string with triangular brackets right so you do know tupal with Cur braces right and Tuple is a classical Tuple like in functional languages right so uh let's actually try to do something like this one two three so there you go this is in fact a tuple uh right and reading from a file Returns the a tuple of okay uh and file here is an interesting thing you may notice that they use uh symbols uh with capital letter and and a small letter interchangeably right this is because uh whatever starts with a capital letter is an actual variable it is an actual variable whatever starts with a small letter is not a variable it's a symbol it's an actual value it is an actual value and the symbol is similar to how you have symbols in list right so essentially if I try to do something like this it will tell you okay or mom your mom is a valid expression in earling it is a valid expression in lling but if you capitalize your mom right so it will say that there is no such variable variable your mom is Unbound you you have to bind your mom first right you have to say something like 69 and only then right after your mom has been bound you can say to you can refer to this specific variable right so and so you you can do symbolic computations uh with this language right so similar to to this and essentially if you manage to read the file successfully it will return you a symbol okay if you couldn't read the file it will probably return you a different symbol right so in this specific case you probably can treat them as enumerations or something like that we can try to read the file and see what's going to happen uh right so let's actually try to read that Beam file that we got I think it's kind of interesting so we're going to do main uh beam right and what do we get we got symbol okay and the beat string the beat string that we were talking about there you go so and you can start p and matching it right and in here as you can see we're first reading this entire thing and then pattern matching it with this bit string pattern but we could actually insert it in here and pattern match it simultaneously uh right you can do that I'm really curious what's going to happen if you try to provide something something that doesn't exist right it returns error right but it still Returns the Tuple of two elements it still returns Tuple of two elements which means that you can actually do a very interesting way of handling uh this kind of stuff right so let's actually read uh right you can do something like this and then you can do case of okay content and just start handling it right so to do uh and I I think you have to yeah yeah yeah so then in case of an error in case of an error right you can maybe print that error so you see this is the reason why um it returns you uh you know stuff like that so it makes it easy to P much by a successful operation and then by an operation that failed all right at least this is how I understand it I don't remember how to close the case by the way I don't remember how to close the case who remembers how to close the case orlong uh switch case right this you have to put something like end uh it doesn't automatically do this things so K is statements so let's actually Google it up uh so the K is off right but so that means you have to not put this thing as the yeah there we go so this is how you close this this specific thing uh but what's interesting is that if you do the way they do that in here right if you do the way they do that in here if you won't be able to read the file this particular pattern just won't match so it's going to be a roundtime error right so the case didn't match uh somebody says esac I don't think it's the case uh it's it's it's in bash right it's a bash thing not earling thing but yeah so uh yeah there we go so so far so good so good uh so far all of that kind of makes sense doesn't it so far all of that kind of makes sense and as you can see here we're just po matching this entire stuff and then we can return the size uh right the size and the chunks right you can even actually say okay this is binary so it's going to be the rest of this stuff right is going to be the rest of the stuff it's pretty cool so we can even specify like a file name as a parameter in here so right and we can even expert uh this thing like that so as you can see I'm indicating that read uh has one argument right so that's why I put one in here so we can try to now go ahead and recompile this specific module right so c c main. Earl right and uh it works fine right so this is hello and uh the module is actually able to read itself right so if I do something like that as you can see there is no such function there's no function read/ zero because we didn't provide any argument right so but there is uh read where that accepts the file path right so we can provide main. beam uh there we go and we got uh this specific thing so the size is bigger right so the size is already bigger because now the module contains right the module contains this specific function all right so the module contains this specific function so it's now bigger uh that's kind of funny actually so cool uh what if I provide something that doesn't exist right so that means this par matching is not going to you know pass uh exception no matching of the right hand side error E No inet No E no entry right so Ino entry and to be fair this is a completely unreadable error honestly if I saw an error like that I would be what the is that what the is that though it does provide the location where it happened and if this is a common way of uh handling errors right in earling you can kind of guess what's going on in here right if you have C error and in no end you may kind of quickly guess oh somebody's trying to read the file at that specific Place let's actually go there and inspect and there you go so we couldn't read the file but honestly it would be better if it would just print something more readable um but to be fair to be fair as far as I can understand this language was not created for writing like um end user applications right so you wouldn't write a user application that you would install on a desktop machine or something like that it was written for servers right so uh and for service right so maybe um you know user friendly error messages are not important so what's important is the developer friendly messages right if you know what I mean though to be fair that message is not developer friendly item uh so yesu Yu yesu so and then we also have uh different chunks in here right so then we provide the size and we provide the chunks and stuff like that to be fair um I ran out of te already I ran out of te and uh it's getting a little bit cold in here there's a reason why I'm wearing a track suit right now not only because I'm Russian but also it's because it's a little bit cold inside so I need to uh refill my tea I need hot tea right I need hot tea to survive and so let's make a small break and after the break we're going to continue parsing this entire stuff and exploring this format so far so good I'm really glad that I'm learning something interesting something new and like apparently I discovered a pretty cool language for Pars in binary formats that I may use in the future for something else that is not a bad idea so having something like that in your toolbox is actually super useful I mean it's just like Barson binary shed in nurlink is kind of pleasant honestly it's like yeah I can see myself using that outside of today's stream just to quickly parse some binary format like it's like it's easier than in C right so earing is more suitable for parsing binary formance that than C honestly um it's quite surprising it's actually quite surprising didn't expect that so anyway let's make a small break and um all right so let's go ahead and try to maybe separate the hello world program and the program that we're going to use for Pars in that hello world program right because I don't really like the fact that uh you know the program is Chang changing as I'm developing the parser for it right so I think I want the test data to be stable right so let's actually separate all of that stuff in here so I'm going to create hello. Earl uh and the module is going to basically hello uh right and we're going to Simply uh just expert the hello function right so we're going to expert the hello function and this is going to be the rate and there you go here ISO hello and uh what I'm thinking is that uh I want to rename main to something like beam uh right so you have a beam which is responsible for parsing and stuff like that and you have Hello uh which is responsible for just printing hello world right so that is basically it uh and in beam we only import that export that and in hello we only export that okay so let's go into the Earl uh earling thingy and I'm going to compile by the way I wonder if I can do contrl c contrl l usually there's a convention in emex uh extensions that if you press control c contr l it automatically loads the the thing but that didn't really work properly so I don't know uh how that's supposed to work contrl c contrl l in this specific case just opens the uh the reple but it doesn't really reload the module I wonder if there is something like earling compile okay there is a function called ear compile let's actually see what it does uh earling compile right so what's what's the function this contrl c contrl k right compile module in current okay so that makes it actually super easy to do and it just like earing compile function this is actually cool okay so contrl C control K all right look at that it just it literally called C A compile function for me right and it called it in a very verbose way uh with a full path and with setting output to the same path that is a very verbus way of doing that but I mean if it worked out why not right I mean not bad it's actually pretty pretty cool so that means now I can do Hello uh hello like this and it works and then I can change uh something like hello sailor contrl C that and then boom it works then back recompile and it can be super effective I can be super effective with my earling application isn't that pogers my dudes I think it is in fact pogers anyway so uh I'm sorry I can also recompile this entire thing uh main oh yeah let's actually call this thing beam uh beam uh read and we're going to be reading actually hello uh hello beam and we managed to read this entire thing and this thing is actually quite it's it's really bigger than uh than original hello world surprisingly let me actually remove this thing um why is this thing bigger though because I didn't think yeah it's kind of weird but that's fine anyway so the next thing they do in here right the next thing they do uh they actually uh read the chunks they have a function that reads the chunks and they pass the chunks in here and what they do is that they match each individual variable in here separately name character by character n a m e which makes me question why why they do that that doesn't make any freaking sense to me but okay uh so one of the things I can try to do I can do read the chunks uh chunks and just literally match this entire stuff like that uh name right uh all right and what I'm going to do I'm going to literally return whatever we managed to match in here right so and I'm going to do read chunks right read the chunks and yeah maybe I should call it read file but I mean I already I already have a different name for this entire thing uh okay so in function uh no function close matching read chunks that is very interesting is that because I Define this function after is that the reason what if I Define it like this and then try to do that no that is bizarre honestly H doesn't make any sense to me so let me double check that I'm calling this function correctly yeah I am calling this function uh P match so oh P match the rest yeah yeah thank you thank you so much so uh let's actually do it like that yeah so you're right tail binary because they have this yeah so by just say name I said it has to be exactly four bytes but this is not true I need to have some sort of a tail in here you're you're right you're right that that makes sense uh so here let's actually try to extract that and tail is unused and I wonder can I say underscore like so yeah that's an interesting way to do that right so ignore the rest of the binary things okay so let's try to run that and we've got this kind of stuff right we've got this kind of stuff uh I wonder if uh I can convert this kind of stuff to maybe a string that would have been interesting actually but don't quite remember how to do that uh don't quite remember so I suppose um there was a function that allowed me to do that [Music] um I remember two [Music] string binary to list can I have binary to string uh something like binary to string and in here I do 69 69 69 and there is no such thing unfortunately there's no such thing earling binary to string there should be something like that because there is a binary to string which in our to list which in our case oh wait string is a Leist strings are Le okay right um oh it even highlights it like that look at that uh so read that at u8 at u8 in interestingly interestingly what we probably want to do we want to try to accumulate um accumulate different names of the chunks right and what's funny is that there in the book they're using they're using recursion to do that right so after the chunk uh after the chunk name we have the size of the chunk and we know how much we have to skip right we know how much we have to skip um and align by four align each Chunk on even four bytes uh I'm not sure how useful that is but I mean yeah anyway so an the idea is I like here I have an accumulator right so here I have an accumulator and then I set accumulator to empty list so this is sort of like a recursion uh that is tail tail coal recursion right with with an accumulator and stuff like that and here is an interesting thing uh read chunks has two closes two patterns it's kind of similar to to haskill in that sense it is kind of similar to haskill in that sense uh right so essentially we consider two situations when we have this specific pattern in binary and when we have nothing in binary right and as soon as we have nothing in binary we probably want to return an accumulator one of the things they do in here they actually reverse in accumulator because if you do recursion with accumulator like that you get stuff in a reversed order so because of that they like to do uh lists reverse in this case so we have to probably do it something like that right there we go uh oh and by the way when you have several sort of patterns per function you separate them with semicolon right so you indicate that this is basically a SN definition of a single function it's a definition of a single function even though there's several patterns in here right so first pattern it's when you have something in here right so you have something in here and uh this pattern is when you actually exhausted all of your input and what we want to do in here uh in my opinion we want to collect all of the names of the chunks we want to collect all of the names of the chunks and essentially I want to continue the recursive call read chunks with a tail but I need to skip the size amount of bytes is this polymorphism polymorphism is kind of like a overused word that doesn't really mean anything anymore uh when you say polymorphism what exactly do you mean right so what exactly do you mean depending on the context people use this word to describe absolutely different things actually right absolutely different things somebody says this is a par matching yeah it is true this is a PO matching but whether it is a polymorphism I don't know what do you mean by polymorphism so how can we skip a certain thing so how can we skip a certain thing so they use a line by four bytes uh and I don't really freaking understand why you need this kind of function honestly uh I don't really understand why you need this kind of function so you have a size uh and align each Chunk on even four bytes do they even say anything about that I think we need to read about this kind of thing so uh after the header multiple after the header multiple chunks can be found the size of each chunk is aligned to the multiple of four and each chunk has its own header buil low um aligned to the multiple of four okay ah so okay M so it gives you the size in bites right it gives you the size in bytes but then at the end of the data right so let me actually right so you have the size then you have data which is the chunk size and then you have additional padding additional padding from 0 to three to align it to uh to four bytes so but the chunk size contains the the actual size in bytes but then you always have to keep in mind the the padding in here I guess it makes sense right it's kind of an interesting way to to do that but yeah so you take the size um yeah and this is very interesting formula by the way this is a very interesting formula how they align this kind of stuff to to four right so let's actually see uh so let's take uh take a look at this formula that they use I might actually paste the entire formula uh in here uh right like so so how do you do oh this is how you do comments in here imagine you have uh you know bytes so here are bytes so what does it mean align by four right so let's split all of the bytes in into groups of four right let's split all of them into groups of four and as you can see uh it's divisible the amount of bites is divisible by four so everything is aligned by four so if you have like this amount of bytes this thing is not aligned by four the amount of bytes is not divisible by four to align align it by four you need to pad it with additional zeros so the total amount actually divisible by four this is how you align it so but here is an interesting thing you are given the amount of a but you are not given the size of the padding you know the amount of Ace but you don't know the full amount in here including the padding how can you figure it out how can you figure it out so here is an interesting thing if you divide this amount of a this amount of a by four um using integ division using integer division you get 1 2 3 4 the result is going to be four and this sort of small tail is going to be lost this small tail is going to be lost right so if you have n and um you divide it by four you will only get four chunks excluding the tail to get the size of the tail you have to use the mod operation or maybe percent how do do the mod in earling earling mod operation mod operation uh modular so how do we do modular or remainder in earling mhm it's called remainder okay so you have to do Ram but in uh many programming languages like C JavaScript python doesn't matter it's usually percent you probably seen a percent operator it's basically the operator that gives you this tail Co so-called remainder right it's it gives you this sort of thing so if you just do integer division you get all of these things without remainder the amount of these groups without the remainder and then by doing REM by doing Ram you get the size of this thing so that's basically the difference between diff and RAM okay but you want to figure out the size of the whole thing right but you know only the size of this thing what do you want to do so notice how the first thing they do in here they add plus three the add plus three so let's actually consider several uh situations the first situation is everything is aligned everything is aligned by um by four if you add three you basically turn this entire thing into that after that you divide it by four so you basically get rid of this thing and you get precisely four of them right you didn't have anything in here and uh you did the separation and it went back to normal uh the next situation it's when you have one this thing in here one this thing in here uh you add three effectively turning into that you divide by four you get the full thing you get the full thing so you get exactly what you want uh right another situ sitation it's when you have two of them you add three right and you get extra one in here you divide by four and you get rid of that remainder you get what you want right so essentially the only situation when um you're not going to get this additional thing is when everything is already aligned that is why they add 4 minus one in here they add four minus one in here right so basically they extend uh this uh tail so then after the next division the extra stuff will be removed away and you get exactly the full size including the padding including the alignment that's why it is like that that's how they do alignment by four so and usually here is not really three it's 4 minus one and you can even say you can even generalize this entire thing if you want to align by some other number M you can put that m in here like so that's how you align by any sort of number that's how you align by any sort of number uh so there we go so that's basically how it works it's basically how this entire thing works but in our case uh the function already says align by for so we're going to keep it like that so yeah it's a little bit involved but it's more of about knowing the fundamental operation of division there is nothing specific to Computing or programming in here honestly this is the uh this requires just understanding how the division operation Works which to be fair looking at the current state of the educational system in many many countries could be too much to ask from people to be fair right how many people who basically got out of the high school really truly understand how division works right like intuitively not in a level of memorizing that stuff and like some tables and stuff like that but actually intuitively understanding a division the operation of division um not that many people right I had to teach myself same mad same seriously uh even for me when I came out of school I still didn't fully understand division until I started to do a lot of competitive programming and that kind of helped me to intuitively understand division it kind of helped me to intuitively understand division uh right so and my intuitive understanding in division is actually um basically thinking about groups of things and divide Bic thinking about uh things and dividing them into groups right essentially if you have um like a bunch of A's right and you want to divide uh a bunch of a by for instance five right you want to divide them by five what that means what does it mean to divide these bunch of A's by five you divide them into groups of fives 1 2 3 4 five this is the first group right this is another group this is uh another group this is another group uh this is another group and uh this is the last group and then you come found how many groups you've got you've got six of them so the operation of division of all of these A's all of this A's let's actually call n the amount of A's is n divided by five is equal to six it is equal to six and the remainder of the operation division is two so the stuff that didn't fit into the uh bucket of five it didn't fit into the bucket of five so the remainder division of five is to so this is how I intuitively understand division uh right and what's funny is that uh you can this operation is sort of like interchangeable in the sense that you can divide this thing by six and you will get five right so let's actually go and divide this entire stuff by six uh like so uh it's going to be six and I really recommend to actually play with sort of like objects uh playing with actually tangible objects that you can like move around and Shuffle really helpful in understanding this entire thing you've got five of them and what's funny the remainder still the same so I think the problem is that people quite often are ashamed to explore such a basic fundamental things with such childish basic Notions right so I'm basically explaining division right now with apples and pears with appin and pairs literally it's just instead of apple and pairs I'm using a but I could replace them with apples and pairs I feel like we should not be ashamed go back to elementary school and start talking about basic arithmetic operations like that the the fact that people are ashamed to do that is kind of the reason why we can't like why we see grown up people who do not understand division because this kind of stuff literally had to be explained with apple and pairs even for adults and there's nothing to be ashamed of I'm thinking about division in terms of apples and pairs myself and I'm not ashamed of that so yeah I think it's important I think it's important if we're going to stay ashamed we're going to stay uneducated right and what's funny is that uh I worked at a company I worked at a company where we had a system administrator like old school system administrator and he needed to set up cron to do a periodic thing but to properly come up with a formula he needed to do mod and he couldn't understand mod he couldn't understand mod he came out to me and just basically ask like here's the formula I Googled up on the internet it uses mod and stuff like that I literally can't understand mod he was actually older than me like by by 10 years or something I literally started to explain him the division and mode operation like this he understood it perfectly like within 20 after 20 minutes of just showing him this he understood it perfectly and he never came back to me with this question and he configured everything perfectly he just needed like a 20 minute of quick explanation on the level of apples and peirs and now he can do like periodic operations on Chrome and stuff like that so yeah and it's fine it's fine and the reason why we got into that point is not because he's dumb or anything he's actually extremely smart the educational system sucks the education system sucks um so yeah okay so we got that uh we got that so let's take a look at at what's the next thing we need to do so we managed to align uh things in here so we've got the yeah we've got the SES uh one of the things I really apologize that I keep switching between bright thing and a dark thing right uh so there is no way to put like a dark mode in there okay so people keep telling me where is the okay people keep telling me to install Dark reader let's let's install Dark reader I don't know so the last time I install Dark reader it actually made all of the website um slower for me so and I just like uninstalled it um uh the internet is already too slow for me as it is and just like making it making it slower is not is not helping it's not really helping so is it enabled already uh so it's already on uh okay so we're already in dark mode so maybe one of the things will happen have to do in here is to refresh the website there we go is that better and we lost the anchor we kind of lost the anchor is that better now is that better maybe uh okay so um yeah so the one of the things we're doing here is we're matching this size the potential size in here so we got L name we got name oh we already did that so chunk length and we just align uh align by four right we're aligning the size by four and we get the actual length in here we get the actual length and the funny o in the in this size this is actually kind of cool in defining the size of the pant you can also use a variable so you have a variable that matches the content and then the size can also be a variable so then you can just you know chop it like that that is actually super cool so we can say that okay so the content of the chunk and here is the size of the chunk and it's a binary and I suppose if you put binary in here it's going to be in bytes not really in bits but it bytes all right and then you say uh here is the rest also in binary right the rest in binary and this is the tail right so there we go you've got a tail and uh here we are just collecting so they are collecting like all of the chunks in here but what I want to do is only maybe collect uh the names and the sizes right so do we collect uh we collect an aligned size which is which is actually a good idea I think uh which is actually good idea so let me put rest in here all right and in here so concatenation of a thing is done through this sort of operator it's done through this operator so this is going to be name right uh right and this is the size all right so and as you can see we're calling to read chunks recursively right there's no Loops in earling there's no Loops in earling you're using recursion tail call recursion right so you give a bunch of btes to this thing we po match them right so we extract the name of the chunk the size and the rest of the things we align the size of the chunk by four bytes getting the actual size in bytes then we take um um take the tail right then we basically uh skip the chunk the length of the chunks and we continue the recursion basically collecting all of these things in here right so it's kind of like how you would do that in hcll as well right so you parse things in hcll in a similar way you do that in a similar way so that's actually kind of cool uh all right so can we now recompile this entire thing so it didn't like oh yeah because I actually provided this kind of stuff in here so uh variable chunk is unused okay so that's fine uh we can just put something like stuff in here uh what else do we have in here read chunks function read 2 uh function already defined this is because I have to put semicolon in here right because it's part of the same definition and there you go we managed to do that so okay we should be able to now um try to extract all of the chunks from the beam hello and here are all of the chunks here are all of the chunks which is actually kind of cool so we we can basically take the size of the file and the chunks in here right so here are the chunks uh right and what's the uh lists lists length is that how we do that how can I get the length of the uh of the lists all right if we take a look at this things so can I see all of the uh all of the functions within this stuff but maybe there is a function specifically for length of this earing uh length of Leist there should be something [Music] that so here is the module for lists there should be something for length is it just Len uh or maybe it's size maybe it's size length flat length equivalent to oh but I mean it's just length with without any module okay all right so we can do length and boom 13 chunks there's literally 13 chunks within hello world uh right so and there's a chunk at u8 which contains 55 another chunk is code 75 and to be fair maybe the chunk that we're really interested in is the code right maybe the one that we're interested in is the code I want to see the bite code uh and stuff like that uh all right so maybe I'm going to include the chunks maybe I'm going to include the chunks into like the actual content of the chunks in here I think it's going to be very very useful so we have the name the size and the chunks right so let's actually try to recompile this entire thing and uh let me try to read this entire stuff right so there we go here is is the code and we have some interesting stuff in here so here are all of the chunks it's pretty cool honestly it's actually super cool uh so let me maybe close some of the stuff in here so I don't care about that uh what's that it's a moderator yeah now yeah because of the dark reader everything is extra slow like I'm just like opening it up look how much time it takes to just color this entire thing that's why I removed um I removed dark career a long time ago right because it's just like it's just makes my computer slow for no reason is it even worth it uh I don't freaking know anyways so I suppose the next thing we need to do we need to take a look at some other tables so here is the like list of the chunks that we got in here so nothing particular special so atom table chunk either the chunk n atom or the chunk n at u8 is mandatory uh right it contains all atoms referred to by the module right an atom if I understand correctly it's the symbols so these things are not called symbols I called them symbols uh uh they are called atoms actually right and I suppose that's basically what they are uh so and if I take a look at the atom chunk uh right so it has the name atom uh let's take a look at uh some of these things so it's called 8 u8 so here is that one and apparently there is only one here right so yeah so there's only one eight at u8 and there's no atom right okay uh maybe 8 at u8 is the one with utf8 in coding right maybe something like that I don't know really so it has atom and the size and then number of atoms right so it contains number of atoms and basically uh the least of atoms right the list of atoms so length of a single atom is BTE a single BTE so which means that the length of the atom cannot be bigger than 255 right so yeah so then the atom name atom length units of eights okay so basically the um you know the characters and it is repeated this entire stuff is repeated by the number of atoms that's a very interesting pattern by the way that's a very interesting pattern so you can repeat groups of things you can effectively repeat groups of things but when you pattern match this entire thing um is it somehow going to bind to a single like list variable or something like that I'm not really sure actually uh and then you have padding because this entire thing is padded uh to you know to to four um the format of the 8 u8 chunk is the same as a except that the name of the chunk is at 8 okay so this was just atom uh let us add a decoder for the atom chunks to our Beam file uh reader we can try to do that but I'm not really interested the stuff I'm interested in is actually code right I'm interested in the code uh the chunk name xpt expert table is a mandatory and contains information about which function is experted okay so that's very cool uh all right so essentially here we have experts and that's probably what they uh right so that's probably what they are um that's pretty cool uh export table chunk uh import table chunk so I suppose another thing you can do in here you can also import things right from from different places import table is Mand information but which functions are imported uh so if we take a look at all of these things you have the atoms you have the code inerts and experts lit is probably literals right string literals and stuff like that um so Lo is some sort of location attributes additional attributes uh line type maybe like additional system information so the the most interesting one for me to be fair is still code right so it is still code and I would like to explore it a little bit more uh code chunks okay so here's the code Chun the chunk named code contains the beam code for module and it is mandatory the the format of the of the chunk is name size subsize there is some sort of a subsize instruction set must match code version in the emulator right so there's different instruction sets and stuff like that op code Max label count oh labels are probably the things that that you can jump to or something like that because it's sort of like assembly right so function count um code so the actual code Shan size minus subsize all remaining data and the padding to to to eight all right so let's actually read what is a subsize the field subsize stores the number of words before the code starts this makes it possible to add new information fields in the code chunk without breaking all the loaders huh but how big it is so oh you can decide you can actually okay huh that's very cool so and the question is how this this is actually quite funny so yeah you can add additional things and stuff like that okay uh right so this is the code and chunk minus subsid right so you skip a little bit and all right I'm sorry um um subsize the instruction sent field indicates which version of the instruction set the file uses the version number is increased if any instruction is changed in an incompatible way the op code MAX Field indicates the highest number of any op code used in the code new instructions can be added to the system in a way such that Aller loaders still can load and newer files as long as the instructions used in the file are within the range that the lawyer knows about makes sense huh that's that's interesting they're using very interesting trick for maintaining backward compatibility right I I I should probably take notes of that right so the next time I'm developing my virtual machine so this actually pretty cool ideas uh right this is actually pretty cool ideas right so have additional like extra space for future fields and also maintain that up code Max right so you can add additional things but your your file might be using like a smaller wrench and it's totally fine uh right so that's pretty cool so the field label count uh contains the number of labels so that ler can preallocate a label table of the right size in one call the field function count contains the number of functions so that the functions table could be preallocated efficiently yeah okay so that makes sense the code field contains instructions chained together where each instructions has the following format the instruction code one by beam as an encode argument and repeat arity okay so it's a single instruct okay you have a single instruction you have an argument a bunch of argument the r arguments encoded in a certain way for beam aam okay that is very interesting okay here AR is is hardcoded in a table which is generated from the Ops tab by gen op script when the m is built from The Source okay that's that's fine the encoding produced by beam ASM encode is explained below compact term encoding section okay so they have a special encoding of the arguments they have a special encoding of the arguments for the each individual op code uh we can pass out the code chunk by adding uh the following code to our program okay that makes sense oof oof oof oof that is very very interesting isn't it that's is very very interesting so uh I would like to work on parsing the chunk but first I need to actually uh get the chunk somehow I need to filter that chunk out out of the uh out of the list of chunks right so let's actually do the following thing uh I'm going to do size chunks and I'm going to just load the Beam file right so here is the bunch of chunks um and I keep okay so this is hello and the size so this is the chunks and this is the size oh why it didn't actually rebind them all right so no match right hand side oh because the size and chunks are already bound variables so you can't rebind them this this is something that really sucks about uh earling immutable but this is not about immutability this is not about immutability it's about shadowing right so like that is actually fine for instance in in Rust uh right so in Rust you can quite easily just do why didn't you enable rust mode emx stupid emx right so you can have this right and then you can have x + 35 uh all right and then you can have print l x and if I'm not mistaken this is totally fine in Rust so let's actually wait for for rust for instance no mutability by the way no mutability whatsoever so and uh right you see so where is mutability I'm not talking about mutability I'm not talking about mutability I'm talking about shadowing uh right so earling could have just allowed that it could have just allowed it it could have just allowed it it didn't allow that that sucks uh so yeah what I'm thinking is that I need to Google that earling uh right so um unbinded variables in repo how do you unite variables in interactive earling shell because this is another thing that sucks about a dark carear so unbind so when you unbind f well forget the value what what's the synta what should I do in okay this is a bad response uhhuh so fa okay a variables Unbound b f okay B I'm get it use f and f VAR don't get so you assign this thing if a variables inbound is is f uh something from earling FX or am I it freaking is holy they called buil-in function f that is insane it's just I'm looking at the examples and I'm thinking f is just a random name you know like Fu like Foo bar and my brain just farts so loudly like you don't how how what what am I even looking at it's just like oh it's it's a buildin holy good one erling good one uh all right all right all right so we have size and that's totally fine okay let me see so we have chunks and here are the chunks here are the chunks so let me see what can we do uh to filter out things so this is that I don't really care about I don't really care about that stuff um so I'm interested about the filter I'm holy that is so freaking slow uh filter we provide the predicate and we also provide the other stuff so here is an element um so why does it do T why does it do T this is such a weird signature this is such a weird signature what I would expect what is the dot what column colum is that like a type yeah I suppose it's okay it's it's a type so even though it's dynamically typed there is some sort of like a type um annotations in this sort of style even though I didn't think it's a uh it's a proper syntax ofong but in any case so we can do lists filter and the predicate comes first and how do I do uh the functions so can I do fun uh fun x and let's say it returns true right and a Boolean is either true or false right it's either true or false I'm sorry I just like can't dark this is literally the reason why I didn't install that cror right the next time somebody ask me just install that CR no no sorry um okay so it's either true or false it's it's literally like a symbol right it's literally like a symbol um right so it's going to be true and I probably have to put dot in here and there was like a problem with the syntax do I have to put end in here uh right so okay what I need to do uh what I need need to do uh so I probably have to maybe put semicolons in here okay earling Anonymous functions earling Anonymous function can R some functions in here yep yep yep yep yep show me just show me an example okay so we just put end in here um all right so yeah that's precise L what I did in here I think it still says that I have uh a syntax error okay so and if I put false in here it returns nothing so in essentially what I want to do is basically if uh name right so here we have name uh right and we can say that the name is equal to code right is that how you do this kind of stuff I wonder so no function close matching uh which is bull uh because one of them is actually this size uh no matching of the right hand side this has to be equals equals can okay there we go we found the code right so this is how we can find the code we can just filter out and we can get only the code right so and essentially we can just have code in here the size that we don't don't care and the uh basically the code chunk right we can assign it like that uh no matching right well I mean it's inside of the list and we got the code in here all right there we go so here is the B like binary of the code itself the binary of the code itself right so and we can start parsing that specific thing we can start parsing that specific thing uh all right so I'm alreadying for two hour hours and I didn't get into actually like learning and generating and stuff like that uh so but I kind of want to honestly I kind of want to so I suppose I'm going to make a small break I'm going to make a small break and I'm going to make another cup of tea and we're going to go for another hour and we'll try to parse each individual like uh instruction right because I want to get the least of the instructions if you know what I'm talking about I want to see the least of the instructions so uh after we get the list of the instructions we'll see maybe we can generate some of them right that will be interesting I all right so let's make some break and um okay uh so let's actually try to parse the code chungus right so let's try to parse code chungus um maybe I'm going to even say something like um yeah read read code right and this is where we're going to accept this entire thing right so this is where we're going to accept it um so but when we filtering out the code like that the way we did in here right uh we don't get the name of the chunk and the size of the chunk right we don't get it um so that means we only have to parse the rest of these things in here so I might as well maybe create uh so of copy paste this thing as a reference and just keep it in here so I don't have to switch to the documentation uh over and over again so that means in here we have a subsize right so we get the subsize of the chunks uh here is the subsize uh maybe then we have an instruction set uh right so this is the instruction set op code all right op [Music] code uh label count label count function count and then we get sort of the rest of the stuffs can you see what I'm doing here you can't uh right so function count and then we have a code and the padding and stuff like that but I can say um I just accidentally copy pasted incorrectly I should have actually done it like that we can say basically tail binary right so this is the rest of the stuff we don't really care about it so and what I want to do I want to basically extract all of these values from here right so I want to put subsize in here then instruction set right instruction set like this but if I just do that I'll get just a bunch of numbers and it's really difficult to understand what each number means which one is going to be function count which one is going to be something else blah blah blah right so we need some sort of like a table associative array or what not if you know what I'm talking about uh right so we have unused tail I wonder if I can do something like that like in uh in Rust just underscore in front of the name I can't do that but that's literally has to be a full underscore um right function read code is unusable ah this is not what it's complaining about it's complaining about this thing being unused we can try to go ahead and expert this thing so it has r one right because it accepts only one argument right so that's that makes sense okay so that seems to be and and prefixing underscore as you can see actually worked so we can use that um you know trick from rust to do this kind of stuff uh anyway so here's the code we've got from the hello this is not really what I wanted but I mean uh here's the code I can do uh beam read code code like so and this is the numbers that I've got as you can see I get a bunch of numbers there's they're not particularly useful so what I'm thinking we need to have some sort of like a table or associative array does earling has anything like that llink associative array if not we can resort to a list of payers right so uh associative comparison of programming languages associative array from Wikipedia so okay let's take a look at the earling okay literally the list of pairs literally the thing I said and they even have uh helper functions that to to find the things with okay this is so funny freaking hilarious anyways so okay essentially what we can do in here is just something like this um I wonder if we could automate some of that stuff but I don't think we can easily do that so we can do subsize WR in instruction set this is another one op code Max another one label count another one another one function count another one so they're using strings as um the keys I'm using symbols or atoms uh instead of strings so now if I do something like that look what I got so subsize is 16 uh right uh what the is going on just a second um instruction set is zero the maximum up code is uh 169 right so then the amount of labels we have seven and a function count is three which is kind of weird which is kind of weird right because I only experted um two functions right so one function hello maybe there are some other functions that are implicitly experted in there maybe there's some sort of I think I remember there there were some sort of like init functions earling uh module in need function so basically the function that is called when you load the module or something like that uh it's a preload code containing coordination of system startup um this module and it's not that um but I don't know llink list all functions of module and let's actually go to Google uh so get list of modules experted functions M if module is named say Max and then M or okay so M hello aha here they are module in four all right so that means I can do Hello uh module info and I just call this function and it gives me information about the module and module info some sort of stuff in here I suppose maybe I can provide the one of the keys in here right so for instance if I provide the module like this yeah I can extract I can extract individual Fields out of this thing right so this is basically experts and yeah so that is really funny that is really freaking funny I didn't know that so even though you expert it like one function the actual expert also contains module in four so which gives you like additional information about all these things okay so that explains that explains why when we read the code right as you can see here we read the code the function count is actually three the function count is actually three because there are additional two functions module and four for introspection for reflection of modules and stuff like that I literally didn't know about that before the stream right so people ask me how do I learn things how do I uh do I know so much stuff I just explore like that right in front I learned I already learned a lot um like on today's stream right I already learned a lot on today's stream so people know it people know I around I around until I found out and I found know that's actually pretty cool uh so but the reason why I even assume that maybe there is uh like implicit functions is because I actually encountered something like that before in my sort of career right so a little bit of experience kind of helps you to explore things in the right direction if you know what I'm talking about right so the more experience you have the easier it is for you to explore things by around because you can not only around you can actually around d directionally right so you can kind of know how exactly to around right the more information you have the more efficient you're around you know what I'm talking about you know what I'm talking about yeah so additional experience kind of helps to around uh [Music] um more information more good you find that's a good way to put it I suppose anyways okay [Music] good um so then we have leode uh then we have La code and what's funny is that uh so we take the chunk size we take the chunk size and then we subtract the subsize and that is basically the rest of the remaining data but that is kind of not true isn't it because the actual code comes after this subsize doesn't it spelling it out like that implies that it comes right away it comes right away after the function count and then you should then include subsid within the padding but padding set to be uh from0 to three uh so this is a very weird uh you know representation of how the code is laid out so I suppose this is more like a pseudo code than the real pattern um right so it's more of a pseudo code than the real pattern at Le that's what I would assume okay the field subsize stores the number of words before the code starts right so that means the subsize must be somewhere here right so have function count and there should be something like subsize of I don't know right so extension subsize binary something like that I think that's what it has to be right so and I don't really know why they didn't properly sort of put that in there like I do not fully understand but anyway maybe we need to take a look at their code right we need to take a look at their code uh so here parse Chunk we it got then subsize chunk oh maybe by the way ah okay it's Pro that size probably includes that size probably includes uh all of these things right so in our case the subsize was 16 right so and what do we have in here we have four 32bit values uh I mean I don't need python I don't know why I reach to python um right where is my where is my stuff four multi by four 16 so subsize tells us the size of this thing that is really funny actually that is really funny so uh subsize is the size of this table of meta information about the code and that's why there is no additional padding in here you know what I'm talking about so that's why but we still kind of need to take that into account right so because maybe in the future right our loader our loader is going to encounter like new version right so and there are additional fields in here so what we have to do we have to just read all of that information right all of that information and just skip subsize of things um right so that's probably what we have to do that kind of makes sense interestingly interestingly what I think is that we can even go further and just match only 16 right essenti usually if we encounter anything that is not 16 um right we we can say we can't lower that code we we just can't lower that code we can do something like that that would have been interesting that would have been interesting funny enough what how they implement it they actually take um the code o they've done a very funny thing actually they've done a very funny thing they accept the list of different chunks the list of different chunks so their parts chunks function can basically accept all of the chunks that we extracted from the file okay that is very interesting way of doing that um that is really funny way of doing that I really like that so they have uh parse chunks and if you you encounter a chunk that you're aware of you parse it and transform it to something else if you encounter a chunk that you don't know yet we just leave it as it is we simply leave it as it is so none of that is spse that is a very good way of approaching that I really like this book I really like this book I I suppose I think I want to read it more right just like I'm looking at one chapter and one chapter is actually contains a lot of like useful information and the code is organized in a really nice way so you can just like learn a lot from it uh and I really like that I really like that so that's already better than what I'm trying to come up with right so as you can see here uh they read the chunks right so they read the chunks uh and in here right so we just do that do they parse the chunks afterwards then really pars them right it's sort of like a separate function and do they use them anywhere chunks ah so they actually yeah okay I see so the uh read the chunks and then they parse the chunks that makes a lot of sense okay so maybe we should do the same thing honestly I think it's kind of cool I think it is in fact kind of cool so parse uh parse the chunks parse the chunks so yuuu so we read the code maybe this stuff is not particularly useful but uh that's totally fine so pars chunks uh and what we accepting here we accept a single chunk and then the rest of these chunks so we can even do something like this okay so if we encounter an empty list we just transform it into an empty list nothing in particular it's funny how emx automatically support that it's really funny I can try to do that again look at that parse chunks uh right boom okay Boom the next one the next part can your mouse pad do that can your mouse pad do that I don't think so uh anyway so here's the rest and what you're supposed to do so we actually build an accumulator out of that so we build an accumulator out of that that but I'm not sure if I want to do it through an accumulator maybe I should maybe I should build an accumulator okay so let's let's do it like that so we go we returning an accumulator but the list is going to be reversed uh all right and in here what we do if we just encounter the chunk that we never seen before uh we simply just you know pass it next right so this is going to be rest and then chunk to accumulator and as we process all of them uh as we process all of them uh right we're going to just return that specifically I think think that's a good way to do all of that uh and we don't really need to expert read code anymore so and I can transform the code uh like read code as one of the cases in parse chunks right so I can do parse chunks and here what I encounter I encounter a chunk chunk with the name code and with a size right so and with a size and then some content in here and that content is basically uh these things right it's basically these things so I can just move them in here and when I counter those things uh I'm using this specific sort of associative array uh right and I just return that so and I can might as well just put this stuff in here yeah that is a little bit better right so essentially the chunks that we don't know we just ignore them the chunks we do know we transform them into something useful that we can maybe use for something right so like in for example in here and I wonder if I can maybe format that a little bit better is that something I can do uh like this because I don't kind of like how long it is um right so but it doesn't allow me to look at that huh that's not bad that's a really rable pattern I think that's a very rable pattern look at that chunky code from the bean book sounds tasty yeah it does okay it doesn't compile okay it's cool yikes okay so what do we have in here um so a had mismatch right so we've got to head mismatch this is because we need to have the rest of these things right so uh we'll get the rest of these things and we need to proceed keep calling this function right we're calling it unrest and essentially what we're doing here we're just appending it to the accumulator right so that's how we're going to be doing all that um so I can maybe actually take this entire thing and assign it to some sort of a variable right so because I don't want to make very chunky expression like that right so in four let's assign this thing to info right or maybe we're going to call it code info code info so this is our small code info that's pretty cool and then we continue parsing things and I just append code info to this specific accumulator like so uh all right so let's try to compile that stuff uh so what you don't like head mismatch again really for for real ah yes okay so there should be okay I see I see so I should also accept an accumulator there we go that makes sense now okay so that's why there is a mismatch in there that's why okay now now we're talking now we're freaking talking so I don't really care about that specific function so I'm going to just look mark it like this and uh we're almost there so there is a function uh par chunks 2 is unused which is kind of um yeah so this specific thing is used you tell me right so maybe maybe we have to put it at the end somewhere right for for this thing to to work but that is kind of bizarre don't I use this function already I use it in here um so and I expert it so I don't know why you're complaining so par chunks ah I see because we have to do it like that okay finally I managed to compile all that so it is it's kind of difficult to compile for a dynamic language isn't it it's a dynamically typed language and I'm kind of struggling to compile it um which is quite funny in my opinion which is quite funny quite ironic as well uh all right so okay so now I can try to do beam uh read file and I'm reading hello beam hello beam uh so oh yeah I supposed to put dot in here so instead of this thing uh and Define function boom beam read okay so this is just read all right okay so we've got a bunch of chunks chunks that we don't care they are UNP unparsed right so they they are unparsed the chunks that we do care here's the code chunk it is kind of parsed already right so we extracted all of these things uh and everything so that's exactly what I wanted that is exactly what I wanted code Chun code Chun uh so in the question is how we're going to be parsing the code right so we've got uh the info oh and the basically group all of that in full things like instruction set op code Max label count and function count into a single info thing and then the code which is quite cool I think um and then they take up code size size and uh and sub size minus 8 I don't know why they do minus in here uh eight is the size of K okay what the is uh arm L For Real yeah so it's basically it is the size of the chunk size and the sub size okay so so here is the op codes we extracting the op CES and then we're just parsing them in a separate function we're parsing them in a separate function which is reasonable thing to do I suppose which is a reasonable thing to do I really like this idea right so just do it like that so we have instructions subsize right so then you have info all right and you say that in four is basically binary it's basically binary of the size of subsize and that's set so that's how you do that it's a really funny way of doing that right so you have subsize and then you instantly matching the this is kind of cool the fact that you can do that in earling you can match a variable a certain variable and instantly use that variable in the next expression I think that's a thing you can do yeah okay okay let's actually see maybe maybe you can't do that and I'm being too cocky about that but uh yes subsize we can then uh probably um get rid of all of that stuff in here and we can say this is the info right so that's basically the info uh right so I don't know what the I'm doing here and let's try to read that and yep you can do that that is very cool you can match uh a variable extract information and instantly use that variable in the next match so you know how many byes you have to match you can match it on the next expression sibling expression that is so cool holy um I'm I'm telling you parsing binary is easy in earling is easier in earling than in C how crazy is that so after the in four we basically I suppose we get the code right so we get the code right so that's what we can do but personally I don't like that they call that thing a code right well I mean technically it is code actually technically it is code um so maybe that's fine so the next thing they do the next thing they do uh size subsize and eight so we can just take that mhm still do not fully understand how that works right so uh this is the full size the of the whole chunk right it's the full size of the whole chunk and it does include the subsize right okay that's probably why they they do it like that I personally would like to do it like this maybe to emphasize right so we're subtracting the size of the info right so the subze the size of the info and also the size of the subsize field right because size does include like it includes everything in here it literally includes everything here so that's why it is like that and after that uh we're just parsing out the the code right so there's the op code size we get op codes and then we get a line binary and stuff like that so and uh maybe we can just do op codes so op [Music] codes uh like so so let's see if we can compile this entire thing so info is not used which is kind of funny right so they part match info but oh they actually let's see what's going on the pars code in four and then op codes I see I see I see what's going on like we already saw info so we probably don't have to parse that thing B type binary binary is undefined okay all right so here's the op codes and here are these op codes so I'm more interested in parsing op codes rather than the code in four so and kind of the thing is done in here right um so the encoding produced by beam aen okay compact term encoding section okay so all right so they Amit the parsing of specific op codes in this section completely they omit that uh all right but we still managed to extract the you know the code section of this entire thing this we still managed to extracted so the only thing we need to do we need to you know start parsing each individual instructions in there um all right so I think I'm going to stop the stream in here I'm going to stop the stream in here um right so what I'm going to do I'm going to probably schedule another stream and that next stream is going to be entirely dedicated to to parsing op codes and maybe even generating op codes and on that uh stream I'll try to prepare and see maybe on that second stream we're going to actually write a simple programming language that compiles down to these op codes right so today's stream was dedicated to exploring the bean format the beam format and I think we did it quite successfully we also learned enough earling to do that efficiently right so I think it was extremely educational right so I wouldn't say that today's stream was a failure it was actually a win right we learned a lot of stuff in here right so and I can see myself using llink in the future for just parsing binary it's actually very convenient um right so yeah that was pretty cool uh I guess that's it for today uh thanks everyone who's watching right now I really appreciate that have a good one and I see you on the next Recreation programming session where we're going to continue exploring the format of beam and hopefully maybe create a simple language that compiles down to uh this entire format the ultimate win is going to be creating a Beam file out of our custom language and then just loading it with earing repple running it and it will say something that we said in the original language that would have been an ultimate win I think uh we'll see how it goes okay thanks everyone I love you all
Info
Channel: Tsoding Daily
Views: 35,346
Rating: undefined out of 5
Keywords:
Id: F4NM1N2D5-Q
Channel Id: undefined
Length: 127min 20sec (7640 seconds)
Published: Sat Dec 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.