BINARY vs TEXT File Serialization

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
serialization the art of transforming data into some other form specifically in this video we're going to be talking about calization in the context of files and also in the context of game engine development and Game Dev I'll be using Hazel my game engine a lot for examples here I'm not talking about any kind of like Network or memory streams or anything we're strictly going to be talking about files I might touch on other forms later because that kind of it fits into Hazel's serialization system as a whole but specifically how do we get data 2 and from disk in various formats and what are some of the decisions behind why we do it the way that we do so the first thing you have to realize about calization is there are two different types basically there's text and binary sterilization we can write a text file or we can write a binary file now put simply what is text text is you open up notepad and you can write some stuff here this is what we call a text file fundamentally it's a file that I can open up in notepad I can open it up in a text editor and what I see here is just plain text interpreted usually in something like utf8 or asy or whatever the goal with text based formats is I want it to be readable by a human because humans read text this on the other hand the latest game that I made for Lam D 55 is what we call a binary file specifically it's an exe file what happens if I try and open this up in notepad this I mean it kind of looks like text as well I guess you could call it that if you look carefully you can actually see it does contain text it makes much more sense to just drop this into a hex editor such as hxd that's my favorite one I'll leave a link to that in the description below it's free and then this is what we're graded with you can see we can still see text but this is what we call binary data now yes you Smart Ones out there who look at this and say why that's not binary binary is zeros and ones yes binary is zeros and ones what this is is those zeros and ones but displayed in hexadecimal format a base 16 number format versus a base two number format like binary because that would be extremely hard to read I mean this is not easy to read like because we're used to text as I mentioned earlier in this video however if all this data was literally expressed as zeros and ones here it would take four times as much space and it would just be a lot harder to read because each one of these heximal digits corresponds to four bits which is why one of these little groupings here represents one bite one bite of data that's why the max value is FF which is 2 55 which is 2 ^ of 8 8 Bits - 1 because we include zero right this is beginning to feel a little bit like an entry-level computer science course so let's zoom out and talk about in which cases we would want something like this kind of data versus this convenient HR file which is as you can see a text file the answer is simple humans now if you are trying to get into computer science and learn programming then check this out brilliant.org has a 30-day free trial that you can use use to just check out their entire platform and if you don't know what brilliant.org is it's an amazing website filled with lots and lots of really high quality courses on various temp topics I really like their introductory computer science courses they're a fantastic way for beginners to wrap their heads around what it actually means to think like a programmer and learning this kind of logic can be challenging for people who are new to it and brilliant's visual engaging interactive way of teaching helps so much with this process which also flows really well into their new python courses where they'll actually teach you the language and how to build some applic but the best part is that it doesn't stop there they have such an amazing Math course Library the thing about math is that it benefits so greatly from Brilliance way of teaching all of it being presented so visually and with these widgets that you can play around with to see how different values changes the way that it works along with being quizzed every step of the way to make sure that you're actually learning and retaining this information there's honestly no better way to learn and as I mentioned brilliant have a 30-day free trial that you can use to check out their entire platform just go to brilliant do/ thej link will be in the description below and try it out for yourselves and Brilliant have also been nice enough to offer you 20% off an annual membership if you do go on to like it using my link in the description below huge thank you as always to brilliant.org for sponsoring this video text files put simply are not good for the computer okay the computer does not Excel with text files and I hope that it's not hard to see why that is each one of these little characters that you see here is typically like compression and other encodings aside expressed as a bite so if we go back to here where we actually can see some text written you can see just how much room that text takes up if we're talking about how to store this data in as optimal a format as possible then the fact that we need to store all of the characters to this string and then design code that will read that string and match it potentially with another string so that we know what it is we're trying to read is a little bit wasteful what I'm trying to say here is that it requires processing power it requires our computer to do transformations to go from English words and this layout even of this file to something it's more comfortable with which is basically this kind of data here so with that in mind the answer might just be to go with binary all the time but again that's where the human comes in this is very hard for us to read it's very hard for us to edit especially without breaking the format and knowing what we need to do where but on the flip side this is very easy to edit if I want to I don't know change the physics Sol of velocity iterations whatever that means to like 15 I can just do that it's very easy for me to do that if I know that some other Collision layer or something needs to exist here and I want to perhaps manually add that in here you can see that I can basically figure out my format rather easily here I I can be reasonably sure that this will in fact succeed because I can just see the format around it and the reason why I'm able to do that easily is because I'm a human and this makes sense to me but that's not all if other humans also come in and change any of this either directly in the text file or more sensibly by using software like like I don't know the game engine editor because it's text I can easily see these changes I can see that this line over here was added I know exactly what the change is if I have to merge files together or whatever it's a piece of cake because as a human I understand that this is the fundamental reason why textualization is well quite frankly used at all it's the whole human factor it's the whole if a human at any point in time needs to make decisions or change data we're prepared to take that extra step to make the computer do more work than it otherwise would have in order to transform the representation of that data to something that makes sense to us humans because I'm sure that you can imagine that if you had to deal with the merging of this data instead it would be a complete nightmare it wouldn't make any sense not to mention that things like simply changing a name to a longer string can actually break the binary format entirely because the rest of the file might depend on the size of certain things or certain offsets within that file and as soon as you insert some bites or remove some bites from the middle that could break the entire rest of the file and you'd basically have to have a really good understanding of the format and adjust it in the right places again usually using software tools rather than just trying to edit it manually and so that's basically the Golden Rule text human binary computer if you don't ever plan or need to actually visibly see the serialized data on disk or make adjustments to it or do things like merging together multiple people's versions of that file then this is the overwhelmingly better choice however if it is a file that involves a human so the human has to look at it edit it merge it whatever then of course text is the best option another really good way to look at this is that text files are quite good for your master files and binary is quite good for your published files so what on Earth do I mean by this in the sense of game engine or game development for example Master files are files that you use during the development of the game so a good example might be like your 3D model as a gltf or an fbx file it could be an image or a texture as a JPEG or a PNG it could be a Photoshop file or a blend file it's the files that you are actively using as part of the development of the game they're usually in an intermediate format another great example of this is your scene file so the file that actually stores all of the stuff that you've put into a scene inside your game like for example this scene the castle interior scene from my latest game this is serialized just in yaml in text and the primary reason why this is so powerful is because I can just plainly read it and if someone modifies anything like this just changes to this then over here in Version Control I'm going to see exactly what the change is if someone adds another entity as long as these uu IDs are sorted it will just appear between like two other entities for example if multiple people are working on that scene and I have to combine their changes I can easily just do that using a text editor I don't have to worry about either creating dedicated game engine tools for that purpose or worrying whether or not they're actually going to work properly because I'm down at this primitive level where things just work the data is just text I can see exactly what it is and make the necessary adjustments to it whenever I talk about this the immediate next question that people have is yeah but isn't this slow like this is 2 and 1 half thousand lines of text that all needs to be paed that all needs to be sorted out and transformed in order for us to actually load a scene in our game and that very naturally brings us onto our next Point are you in a development situation or are you in like a runtime play situation because if you're developing this game and you're just in the game engines editor then waiting for a scene to load I mean first of all lots of extra stuff probably has to happen in order to load a scene in the development environment in the editor versus in the runtime because things have to be available to be mutated to be edited to be changed so you're already taking a performance hit or performance difference to what it would be like for someone playing the game when they actually load a scene in the runtime but also your goals your priorities your environment is different it's okay to wait a couple extra seconds potentially for a scene to load because you're simply working on the project and it's normal it's fine to just wait a little bit extra for something to load if you contrast that with the runtime though if things in the game are taking long to load which causes either stutters or loading screens being too long or loading screens existing at all that hurts the gameplay experience that is something that we would absolutely try and avoid because if you're sitting there playing a game you don't want things to take time you don't want delays whereas if you're developing the game that's more or less fine of course the magnitude of these delays absolutely matter if a scene takes 3 minutes to load versus 3 seconds even in a Dev environment that that's that just sucks but it's generally not going to be like that in fact I should probably do a little bit of profiling to see what the difference exactly is and so hopefully you can see from the discussion we've just had how text actually naturally fits really well into the development of a game and binary fits in really well into the runtime of a game the actual final shipped game that you ship should probably contain all binary files because no humans are going to be reading those files and you want it to be as fast as possible no one's going to be worrying about merging different scene files using like git or whatever it's the published game people are just going to play it but during Dev that's where text is really really important but of course there are exceptions to both rules you could have specific files that you do want players to be able to either modify or share which may mean that even though it's the runtime and it's the shipped game you may want those files to be text and conversely there may be files that if they were text they would just take way too long to load in your game engines editor and therefore it might make more sense to just have them as binary a good example of that might be you know textures or like 3D models we don't necessarily need to edit the vertices of our 3D model by hand or like deal with two artists working on the same texture and then trying to merge their changes together so that together with the fact that it might actually be a lot of data to process and and handle and transform might mean that those files are better suited to Binary even though they are in the dev world and so that's how sterilization Works in Hazel and how and why we choose text or binary in the actual game project this is the one that we develop basically everything we have here that is generated by Hazel such as the scene files are just stored in plain text however when we build and ship the game all of these assets are transformed into binary data stored within this asset pack file if you want more information about how the asset packaging system works in Hazel I made a video about that recently I'll have it linked up there but the main benefit is that this asset pack contains all of the Assets in an optimized format for Hazel's runtime to just load it in a format not optimized anymore for editing or merging or viewing but rather in a format that suits the computer versus the human hope you guys enjoyed this video if you did please don't forget to hit the like button in a somewhat continuation of this video what I want to do sometime in the future is take you through the actual like C++ code that does the sterilization and distalization of both of these formats inside Hazel so you can have a practical look at how we actually achieve these things if you're interested in seeing that please leave a comment below along with any other suggestions that you have for videos you'd like to see and finally if you'd like to get access to Hazel either to use it or to support the project or to just get access to all the source code you can go to patreon.com link will be in the description below thank you to all of the patreon supporters that make hael possible I'll see you guys next time goodbye
Info
Channel: The Cherno
Views: 46,440
Rating: undefined out of 5
Keywords: thecherno, thechernoproject, cherno, c++, programming, gamedev, game development, learn c++, c++ tutorial, game engine, how to make a game engine, text, binary, serialization, text vs binary, binary vs text, binary vs text serialization, c++ serialization, writing files, reading files
Id: Lq9KXnU4_yE
Channel Id: undefined
Length: 13min 48sec (828 seconds)
Published: Tue Apr 30 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.