How does the database guarantee reliability using write-ahead logging?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so every persistent database needs to be reliable which means any updates you file into the database needs to be reliably stored onto the disk no matter if the crash happens anything goes bad uh all of your committed transactions everything that you have committed to a user to have saved onto the disk needs to be there right so there is a very popular technique to ensure that this happens it is called as write ahead login so in this video we'll go through what right head logging actually is uh how it is implemented advantages of it we will go through a few very solid advantages of writer headlocking one which includes gaining an amazing performance out of your database and we'll also take a look at a few internals of right ed logging helping us understand how this is structured what kind of other things affect it okay so let's jump right into it so as i said uh reliability for any database is very important in most cases in most in most cases databases guarantee persistence which means if you are performing any transaction any update onto the database and if your database says hey this changes are persisted no matter if your process crashes your machinery starts anything anything boots up again the the changes that have been committed needs to be there right you can't say that hey the changes are lost so database offers you this guarantee so a few solid like few very common databases that do this is my sequel postgres dynamodb and whatnot any database to be honest that gives you the persistence has to do this so what exactly happens upon commit upon commit like you have made some changes onto your database you actually fired a query and then you committed the transaction so when the transaction is committed what exactly happens so the changes you made let's say you updated a particular row the changes you made are flushed onto the disk disk as in any non-volatile storage it need not be this it can be any non-volatile storage for example s3 not recommended but for example when you say that hey i've saved your changes it needs to go into a non-volatile storage something that is not prone to power loss os failure and hardware failure so whenever you are committing something right whenever you are committing something you have to ensure that no matter what happens no matter if there is a power loss or operating system failure or hardware failure as in your machine crashes so long as the disk is intact right so long as a non-volatile storage is intact you would be able to see those changes that you have committed right that's what should happen on commit because that's what is what your database is guaranteeing you so a typical flow that happens is like we'll just go through and what exactly should happen at commit so an engineer fires a query onto a database through mysql client postgresql what not let's say he or she updates her as soon as the row is updated as soon as the transaction is committed the corresponding blocks on the disk are changed right so there are some changes that are made on the b plus three struct if i'm talking about relational database there are some changes that are made onto the b plus 3 structure onto the disk where the table data is stored and if there are any supporting index for that particular table those index are altered so the few blocks that are affected are say this and this right where the actual so on this block the actual row have was there and this was the block where the corresponding index or there so the blocks were changed maybe a few other blocks like this who which come on to the path right so these sort of changes happen we every time some commit is happening it is translating into multiple disk ios kind of okay which is what you are guaranteeing but to be honest disk writes they seem very simple that you are doing this file open and right but they are indeed very complicated it's not just one set of thing that is going there so let's say the way you are writing something onto the disk it is first done it is first added to an in memory flush buffer which is there in your ram from there it is flushed onto os buffer cache so your operating system let's say ubuntu windows and whatnot it is so every operating system has a buffer cache in which it caches the disk blocks so it's there it's it's passed there then it is passed to this cache every disk that you use has some some cache which caches most frequently access disk clocks to make it a little faster and then it is finally dubbed onto your solid state hard disk or your magnetic storage the actual writing happens there so whenever you are writing something onto the disk these four stages your data always goes through by default there are ways through which you can bypass a few of them while you're having fsync so when you open a file see what kind of mode that particular file is being opened with so the mode the mode could be a regular read-only mode a regular write-only mode but there is an uh an additional option that you can pass in which is that hey i want to open this file in the sync mode which means that step if you open a file in a sync mode it would bypass all caches and would ensure that whenever you drum whenever you write something onto the file it immediately gets synced or it immediately gets flushed onto the disk ensuring that there is no data loss so now coming to write ahead logging on what this technique actually does so this is a very standard method of ensuring data integrity basically basically reliability so the idea here the the right ad logging is just a concept it can be applied to so many different businesses not just not just very specific to relational databases it can be applied to any database in general any place where you want to ensure reliability you will use right ahead logging to solve the problem so the core idea the core idea of right head logging is before you make the changes to actual data files your tables your indexes what we saw in the first one like before you make changes here write those changes log those changes in a separate file which is open in a sync mode as as we just discussed let's log the changes for example you are finding a query update users uh set name equal to a where id is equal to 1. this this change should be logged in a file before you make the update yeah before you flush the updates onto the desktop as part of your commit it should be run first on the lock file and then the changes should be made onto the table or the index this way what happens is like the the the usual flow that would happen is whenever you update a trigger you log the entry you may not log what exactly just needs to be like what exact changes were meant but at least just dump the sql query or any query that you have fired with just say that hey this update was made right so this gives you that guarantee that hey at least because let's say you are updating a million rows you are not waiting for those million rows update to be flushed onto the disk you are just writing that hey update that the right so that entire update statement is flushed onto the disk in the lock file and then you take your own time to write those status onto the disk this way this operation as you would have very very rightly get that this operation is lightning fast because the file is open in sync mode and it's just a simple append only file in which you are appending all the logs one after another and then you take your own sweet time to update the changes onto the disk the actual disk clock so as part of a commit transaction as part of your commit the first thing that you will do is write all the statements that are part of the commit one after another non-select statements all updates should go over here okay and then once that is done you then trigger an update onto your data files and onto your indexes because it will take its own suite time to do it right okay so what are the advantages of right ahead logging first of all on every commit like if on commit you were flushing the entire data changes that happen entire data alterations that happen onto the disk you can now avoid that and this is one of the key reasons why right ahead logging gives you immense performance out of your database because now there is no need to flush the data changes on every commit because you are already logging the updates in a separate log file you do not have to commit those changes immediately you can keep them in an in-memory buffer where those where those where all of the changes are buffered you can just write those things and then asynchronously slowly and steadily once a while dump those changes onto the disk right so these are micros in this not major changes that we talk about but at least microgenesis you can keep them in buffer that any time the change like the the changes are requested it can directly be served from the memory rather than hitting the disk so as part of cache alterations as part of buffer alterations you can do that then in case of a crash we recover by replaying the logs if you see the logs what what do these logs contain these logs contain this the queries that you made on the onto the database the updates you fired onto the database so in case of a crash where the changes are there in the log but not yet flushed onto the disk you can replay those changes and build and just ensure that your disk and your uh your memory buffers are all in sync so you get crash recoverability even if the changes are made onto a memory but not written but not yet return on to the disk right third it reduces the number of discretes like it's like just an extension to the first statement where we say that hey not every uh commit requires you to flush all the changes onto the disk this is very similar to that where by not really requiring us to flush every single change onto the disk we reduce we significantly reduce the number of disk rights that happen onto the disk by just manipulating memory buffers while logging a statement in a lock file so this gives us the advantage of minimizing the discard which helps us gain a humongous performance out of our db in the previous video where we talked about uh when we talked about the overall ways like ways to get performance out of your database uh how to fix your database in the longer run i talked about a parameter that is a flush frequency like how frequently do you want to flush the changes onto the disc every database gives you this option on altering the flush frequency of it which means that hey i want to flush all of my commits every one second but every few seconds right you can you can configure that or or do i want to make it synchronous which is what like in mysql it's innodb cmp underscore like innodbt uh in otp transaction flush commit something so i'll just i'll just link that video in the description and in the i tag you can go through with it and see you know where uh the how how we can tune the flush frequency there so with that particular parameter what would you do like that that is exactly doing this uh where it is ensuring that hey i do not want every commit to be synchronously flushed onto the disk you can buffer those changes in memory and let's say every one second flush it onto the disk so you are slashing a lot of disk rights but obviously you don't lose out on guarantees that you have given to your user so that's why you just make an entry into your lock file that hey this changes are done and then everyone and you make all the changes in your in-memory buffer and then slowly flush your changes onto the disk right so typically by reducing a lot of disk rights you are gaining an immense set of performance out there one very important another one very important uh advantage of using a log or right ahead log is you get a point in time recovery imagine you have a sequence of sql queries that have imagine you have a sequence of queries that are fired onto the database arranged by time right if you have that you can create a point in time snapshot let's say you fired query one query two query three query for query five and let's say you want a point in time snapshot of your database until just the third query you can spin up a raw database a blank database and start applying these logs till the point you wanted like one two three three queries followed to the database you get a point in time recovery or point in times not recovery but a point in time snapshot of your database till that stage so this is a very amazing way to like write edit logging helps you create that point in time so you can actually time travel on your database and see hey at this point in time how did my database look like at this point in time how did my database look like so you can actually do that using right headlock that's where right head logic plays a very important role in automation testing where you want a specific like you want data set up till a specific date because after that something was messed up you might want to like your data went in an inconsistent state for some bug or for some reason and you want to restore you want to go back in time and and reach to a particular state which is your which is a consistent state you can actually do that it will lead you to lose a few rights but that's what you want because the data became inconsistent so right ahead logging gives you a way to create a point in time snapshot now let's see one very interesting aspect of it so even right eye logging it seems it sounds very simple right i'm just logging an entry but it's not that simple so when you write something onto the disk right it needs to be protected by protection i mean that while let's say you are writing a line onto the disk if your process crashes you need to ensure that the thing that you have written onto the disk is proper because assuming as if you were writing a 100 character long line you wrote like you wrote the changes to 512 bytes and then your machine crashed now what how do you know that the record that you have written onto the disk is properly written there is no data loss while writing it so that's where what you would have is you use something like every wall like every right-hand login uses a cyclic redundancy check as simple as a checksum right and it is protected by that which means that the first thing that it would write onto the file is this crc and then it would write the entire thing like for example if you fire an update it would first write the crc it would actually it would first compute the crc then write the crc into the file and then it would write the other content right so this way it ensures that if while reading or while recovering from the crash it will tell you hey the record that you are trying to write is is is corrupted right it will give you that error so that again protecting your data from going inconsistent this is what your crc typically does it ensures that hey the changes that are written onto your log file they are proper they are intact they are not meddled by anyone while transit while flushing anywhere right so it gives you that warning that hey something is wrong with the logs please rebuild it or do do something around it so that's how it happens so right ahead logging is by default enabled by all persistent databases out there you can choose to turn it off would not recommend it but how you can choose to turn it off so now let's spend this the last section let's spend some time talking about how this file is actually structured so whenever you are writing something like right ahead log is a very simple append only file which means you can open the file only in append only mode so any change you made is just appending to that file one line after another so a write ahead log is not just one file they write a headlock typically contains multiple set of files each file called as a segment and each segment is roughly 16 mb big like each segment like right ahead log is contains multiple segments each segment is one file each file and each file is at max 16 mb big within that file uh within that segment you have pages you have pages each page is an 8 kb big page in the previous video we talked about how disk reads happen in terms of blocks if you are unaware of that concept go through that video i'll link it in the description and in the idec it would help you uh it would help you build a very solid understanding on how disk io actually happens with help of blocks so then each of this page so whenever you read a file let's say your block size is 8 kb you are reading a block of 8 kb 8 kb 8 kb every time so within that 16 mb file you have 8 kb pages and within that page you add and you add your entries so your writer headlock entries goes within that page so there as soon as you write something into your uh right ahead log it gives that entry an identifier that identifier is called as a log sequence number what exactly is this lock sequence number it's the sequence number of that entry in the log file in the right headlock this entry is not 0 1 2 3 4. but rather they played it very smart the entry is actually the byte offset in the log file which is that hey within this file this line starts at this like this entry starts at this offset zero zero zero one two as in the twelfth bit for example this starts at twenty sixth offset so this is these are the actual byte offset of that entry in the log file now what's of what's advantage of this let's say you have a log file right in which you you have read the particular entry it is very easy to estimate that hey till what point have i read the entry it's very it's also extremely simple to go to that particular position in that file because your entry is your lsn entry is nothing but your byte offset it's very easy to hit that particular point and reach it so this the point at which you have read your log file you have applied your log file onto the disk it's stored in a separate table which says that hey i've read my log file to this point when the next read happens it will start from that offset only so it's it's all it needs to store is this one very simple lsn number which is nothing but a byte offset within that lock within that segment so a way to identify till which point have i read this is that in this right ahead log on this segment file for this lsn number or basically the byte offset till which i have completed my red my read and my flush onto the disk right so that's a very interesting advantage of you using not reinventing like not taking the lsat from 0 1 2 3 4 instead using the byte offset as your sequence number so this is a very solid optimization that your database does now again after writing this it writes the crc and the update statement so instead of writing the actual records that were updated it just writes the update statement advantage of that is if you update a million rows this should not up have updates of all million rows that were changed instead it can just write that sql query that you fired onto the database and then while replaying it it can just send it whether it want to replay and it will just replace this particular update onto the database right so yeah that's how this right ahead logging is typically structured it has some really solid advantages that we saw it helps us gain database it may helps make our database reliable it helps in ensuring that we can have a point in time recovery it optimizes the database performance by not needing to flush all the commits onto the disk immediately so that is a stupidest example of how a very simple looking strategy a very simple append only file can guarantee you reliability while giving you performance and this is technique not just limited to database uh not just limited to relational device but any database or anywhere that you would want to have a very strong reliability guarantee you would use something like a right ahead login right okay that's it for this one uh i hope uh i hope you like uh what i covered uh if you like this video give this video a thumbs up if you like this channel give this channel a sub and i'll see you in the next one thanks
Info
Channel: Arpit Bhayani
Views: 18,331
Rating: undefined out of 5
Keywords: Arpit Bhayani, Computer Science, Software Engineering, System Design, Interview Preparation, Handling Scale, Asli Engineering, Architecture, Database Reliability, Durability in ACID, Write-ahead Logging, Write-ahead Log, Write-ahead logging explained, Write-ahead logging meaning, Write-ahead logging in Postgres, Write-ahead logging internals, Database Performance with Write-ahead Logging, Database Engineering
Id: wI4hKwl1Cn4
Channel Id: undefined
Length: 22min 5sec (1325 seconds)
Published: Mon Mar 21 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.