How to build a Durable Write Ahead Log in Go | Segmentation, Auto Recovery, Checksums

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this video we're going to dive into the internal workings of writer head logs and finally Implement one for ourselves using go I delve deeply into the topic in my latest newsletter which you can find linked Below in the description for a more comprehensive understanding I recommend reading the post before continuing on with this video vs are found in a variety of systems ranging from databases such as postgress MySQL sqlite to consensus algorithms like craft at its most basic a while is a file where entries are appended at the end this appendonly method is crucial to its functionality as entries are always added to the end WR operations are remarkably quick due to the absence of dis seeks plus this approach minimizes the risk of corrupting existing data as previously written data remains unaltered when adding new entries when a system such as a database initiates a right operation it first records this operation in the Val prior to updating any on disk structures this step is key to ensuring that the operation can be recovered even if the system crashes before the right is fully completed essentially a Val is comprised of a series of log records each record typically contains information like the log sequence number the stored data and a cyclic redundancy checking code entries in the Val are sequentially ordered and each has a unique log sequence number the CRC code which is generated using the LSN and the stored data plays a crucial role in verifying data Integrity during while reads the nature of the data stored in the well varies depending upon the system in use in a database context an entry might contain details about a transaction including its type the data involved and the timestamp in the event of a system crash these logs enable the database to replay transactions and restore the system to its last known consistent State during reads the Val recalculates the the CRC for each record if the recalculated CRC doesn't match the original the system flags the data as corrupted and returns an error since vals typically serve as auxiliary data structures to primary systems like databases their performance should not impede the overall system to optimize right operations WS often use an in-memory buffer for temporarily storing incoming rights this approach enhances right speed at the cost of some durability the flushing interval for a Val can vary often set to a few hundred milliseconds but this is generally adjusted according to the specific requirements of the application an important aspect to note is that simply flushing the buffer to the dis doesn't always guarantee data persistence this is because operating systems use their own inmemory buffers for dis rights to optimize performance in the event of a power outage these inmemory buffers might be lost before the operating system has a chance to synchronize them with the disk to address this operating systems typically offer an fsync API this API can be used to force synchronization of the buffer with the dis ensuring data durability a well-designed well takes advantage of this API thereby reinforcing the guarantee of data persistence even in the face of unexpected system failures since every state change in the system is expected to be persisted into the well the log file can grow to be massive in size a large log file slows down startup and Recovery processes of the Val to efficiently manage this vals employ technique known as segmentation in this approach the log file is divided into smaller more manageable segments each segment is allowed to grow only up to a fixed size once it reaches this limit the segment is closed and a new one is created for subsequent entries segmentation offers several benefits firstly it enhances the performance of The Well making it easier to manage secondly and perhaps more importantly it introduces a level of fault isolation if there's corruption in one SE segment it's less likely to impact the previous close segments this means that in the event of a fault recovery processes can be more effective as they would typically only need to address the affected segment without having to worry about the Integrity of the entire log now let's address The crucial aspect of Val repairs or how WS handle data corruption despite the effective isolation provided by log segmentation it's essential to recover as much usable data as possible from the corrupted segments this repair process usually concentrates on the most recent log segment as this is where the corruption is most likely to occur a common approach to repairing a Val involves sequentially reading through the entries in the latest segment and verifying their integrity this includes checking aspects like the consistency of the data the correctness of the end of file markers and the validity of the check sums this process continues until a corrupted entry is encountered indicated either by a damaged data point an unexpected end of file marker or an invalid check sum the log file is truncated at this point removing the corrupted portion this action results in a clean log file that retains only the valid entries thereby ensuring the Integrity of the data with this understanding of Val repairs we now ready to dive into the actual implementation of a Val and explore how they function in realible scenarios here we are inside vs code instead of writing code line by line I've already implemented the Writ Ahad log and we'll be examining the functionality and the code base this approach not only saves us time but also AIDS us in building the mental skill needed to comprehend an existing code base which is similar to what you'll encounter in real world as well so let's head over to W.O this is the file that contains most of the code for a wrer head log the first thing we do is we Define the struct for the well the struct has a couple of member Fields the first being the directory the directory is the name of the directory inside which we want to store all the W segment files we then have a field called current segment that holds the current log segment file that we writing to then we have a lock this lock is used for synchronization of the w wrs we then have a last sequence number that tracks the sequence number of the last entry into the right ahead log we have a buffer writer to write to the log we have a timer the timer controls the flush interval for our buffer wrer we then have a should fsync option this option controls whether we want to run an fsync every time we flush from the memory to the disk we then have a Max file size this field controls the maximum log segment size that we want to allow once a log segment exceeds this size we create a new segment we then have the max segments parameter which controls the maximum segment files that we want to maintain in the well once we exceed this number we start discarding the oldest segment files then we have the current segment Index this is the index of the latest log segment and finally we have the context and cancel functions these are used to control the go routines that we spin up in the well so let's see how we initialize and open an existing Val we use the open Val function for this this function takes an a couple of arguments the first being the name of the directory in which we want to store the W the second being enable fsync which controls whether we want to sync to the disk every time we flush then we have the max file size that controls the maximum log segment size and finally we have the maximum segments this controls the maximum number of segments you want to maintain in the log next we ensure that the directory in which we want to create the log exists this does not do anything if the directory already exists next we get all the files that exist inside this directory this is only relevant when we already have an existing log and we are just opening it in case of a new log the files will be empty in case we have some files return we want to get the file with the latest segment since this will be the file that we'll start writing in in case there's no files we create one and this will act as the first segment file for our log so whatever this file is we open that file up and seek to the end of this file we seek to the end to make sure that we're only appending to this file next we create a context and a cancel function we'll use use these functions to control the go routine that we run later on and initialize the well and before we return We update the last sequence number of the well by reading the last entry in the well so what this method does is it gets the last log entry and gets the last entry's sequence number in case it doesn't find one we simply return zero as that will be the sequence number that we start from we'll skip the get last entry and log method for now since it will only make more sense once we understand how the rights are happening and it's pretty similar to the read all method that we'll go over so let's go back to the open function so once you received the last sequence number in the last segment that we just opened res set it to the well and finally return it we also run a go routine before we do that this go routine controls the intervals at which We sync our in memory buffer to the dis so let's head over inside and see what it does it uses a for select Loop to listen to two channels the first one being the timer and the second one being the context. dun Channel this channel is only used to exit the go routine once we are closing down the W the timer channel is more interesting we first lock the W and sync it to to the disk the v. sync method is pretty simple it flushes the buffer writer which is the inmemory buffer to the disk and checks if it should F sync if the fsync option is enabled we run the sync method on the Val file that we are writing to the fsync is often a slow command that's why we allow the users to control whether they want to run this or not running this means slower rights but higher durability and vice versa and once we sync we reset the timer the reason we are resetting the timer inside the sync method and not inside the keep syncing go routine is because we allow the users to call the sync method manually and in case they do so we want to reset the timer in that case as well and that's all we are doing inside the open while function next we're ready to see how we write entries to the log we first lock the Val for synchronization and then we check if we need to rotate this log before we write the next entry the rotate log if needed method is quite simple it gets the file info for the current log segment file it checks whether the current segment file exceeds the segment file limit in case it does we rotate the log otherwise we return the rotate log method simply syncs the Val to the disk so that the current files contents are written to the disk safely and it closes the current segment file it increments the segment file index which starts from zero and increments from there then it checks in case we have exceeded the maximum number of allowed segments if we have we delete the oldest segment file based on the segment index then we create the new segment file for the current segment that we'll start writing to and we set this new file to the current segment and we update the buffered writer to start writing to this new segment file and we finally return going back once you rotated the log if needed then we create the entry using the data that we just received let's see what the create entry method does the first thing we do is we increment the last sequence number to get the sequence number for the current entry then we set the data on this W entry and finally we calculate and set the CRC code for this W entry we calculate the CRC code using the data and the log sequence number and one last thing you might be interested in is where this W entry struct comes comes from you'll see that it comes from a generated file this file is generated using our protocol buffers we Define the Proto definition inside types types. Proto it's a pretty simple Proto buff file we Define a message for the while entry the first field being the log sequence number which is an unsigned 64-bit integer then we have a data field which we store as raw bytes and finally we have the CRC code which is a 32-bit unsigned integer the reason we use Proto Buffs is because their serialization is much faster as compared to something like Json and it also allows us to have strong typee guarantees once you write out this protuff file you can generate the autogenerated files using this command here which runs the protoc compiler to generate these go files finally after we've created this entry we write this entry to the buffer remember we first write this entry to an in-memory buffer to increase the right throughput the right entry to buffer method is pretty simple it first Marshal this entry using the must Marshal utility method that we Define this is similar to a helper method that I discovered in the hcd code base it calls the protuff Marshall method on the entry that we are about to append and in case this method returns an error we simply Panic the reason for this is that this method should never return an error unless there's something really wrong with our protop definition or our code base in general so in that case it makes sense to panic and finally it Returns the Marshall entry the must unmarshal method also operates in a similar manner so once we've marshalled this entry we get the size of this entry because before we write out this entry we want to append this size since Marshall Protocol buffers do not have a delimiter we use this size to know how long the protuff entry is so the first thing we do is we write the size to the buffer and then we write the Marshal entry and finally we return and that's how the right entry method works it's pretty simple isn't it next let's head over to the read all method we'll go to the project outline and go to read all the first thing we do is we open the file in readon mode next we read all the entries from this file object let's see how the read all entries method is implemented we first initialize an array to store all the entries that we'll be reading next we start up a for Loop we first read the size of the entry because remember when we were writing these entries into the log the first thing we stored was the size of the data that we were writing so we'll read up this size and then initialize an array to store this data next this array is the same size as the size that we just read next we'll read data from the file until this array is filled once that is done we'll have all the data we need for the first entry in this data array next we'll use these raw bytes in this data array and unmarshal them and verify their CRC check sum this is done using the unmarshal and verify entry method let's jump in and see how that works we first call the must unmarshal method this method unmarshal this data into the entry object so that the data is now available in the form of a Val entry struct now we send this object to the verify CRC method which verifies the check sum we do this verification by calculating the check sum again in the same manner we did while writing this data down into the log we combined the data and the log sequence number of this entry and calculate the CRC now we compare the CRC with the one that is stored beside this entry if these match it means the entry is valid otherwise we return an error and finally we return the entry once we have this entry we append this into the entry array and continue on the for Loop once we reach the end of the file we break out of this for Loop and return the entries now if you notice the read all method only reads the latest segment of the log you can see that we are reading the file name from the current segment but sometimes you might want to read from a certain offset for that we use the read all from offset method this method takes in an offset which is the segment index from which you want to start reading for example you can start reading from the first segment all the way to the end and it loads up all the files in the while directory it iterates through the segment files until it reaches the desired offset and once it does it starts reading each of those segment files and adding entries from those segments into the entries array once it completes it returns these entries back to the colum now that we've learned how the read and write paths work we're ready to see how the log can repair any broken segment files automatically in the project outline let's head over to the repair function the first thing we do in this method is to load up all the files in the while directory then we see which file has the latest log segment and load it up into memory in readon mode next we seek to the beginning of this file because as we discussed we start reading from the beginning of this file until we reach a corrupted entry we initialize an array to store all the entries that we'll be reading and then we start our for Loop the flow is very similar to how the read method works except we handle the exceptions a little differently we first read the size and we attempt to read it from this file in case we Face an error and unless the error is an end of file error which would mean that we did not face any errors and we successfully reach the end of the file without any Corruptions we replace this file with a fixed file essentially what we're doing here is in case we Face any error that is not an end of file error we take all the entries that we've read so far write them into a new file and use that file to replace the current broken segment file the replace with fixed file method replaces the broken file automically we'll see how this method Works in little little bit in case we were able to read the size successfully we'll read the data in case we Face an error while reading the data we do the same we take the entries that were successfully read so far write them into a new file and use that file to automically replace the broken file in case we were able to read the data successfully we'll decentralize the data and make sure that the CRC is valid in case the CRC is invalid we do the same thing that we did before which is replacing the broken file with the new one which only contains the fixed entry entries and in case we did not face any error so far we simply append this new entry that we just read into the entries array and repeat this process until we find a broken entry or the end of the file and that's all this method does let's dive a little deeper into how the replace with fixed file method works it takes an entries array which contains all the entries that were read successfully and creates a temporary file it Marshals all these entries and writes them into the temporary file in the same manner that we were handling all the rights once it has written all the entries successfully it closes this temporary file and calls the os. rename method to rename the temp file with the current segment files name the operating system handles this operation automically so we know there is no chance of data corruption at this stage and once we've done that we return from this method and this is how you implement Auto Repair in your well and that does it for the core implementation of a right ahead log to see how you can actually use this log we can refer to one of the test cases that I've written let's go to the the tests folder into the Val test. go the first test that I have displays an end to end flow of writing to and reading from a Val we first open up the Val we pass in the max file size and Max segments we allow only three segments here and we pass in a Max file size of 64 MB we then write a couple of entries into the log and then attempt to read them once you've read we just verify that the entries are read successfully in a similar manner we test almost all the functionalities of the log let's go to the project outline you can see we even test the repairing functionality we open the log write a couple of entries purposefully corrupt the data and then run the repair once we've repaired it we check that the correct entries were still recovered and that we're still able to write new entries into the log successfully so feel free to browse the code and understand this project a little deeper the link to the full code is down there in the description again I'd highly recommend that you go through the blog post to get a deeper understanding of how writer head logs work and their internal implementation details and that does it for this video so subscribe to my YouTube channel and my substack newsletter for more technical content

Info

Channel: Jyotinder Singh

Views: 745

Rating: undefined out of 5

Keywords: leetcode, 30 day challenge, algorithms, interview prep, interview, interview question, coding, tutorial, c++, java, python, go, golang, system design, distributed systems, write ahead log, auto recover

Id: 7R7Ex0YkaNw

Channel Id: undefined

Length: 19min 7sec (1147 seconds)

Published: Mon Jan 15 2024