Greenplum Physical Backups with WAL-G / Daniil Zakhlystov

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon and welcome to the last and well slightly slightly late but really last session of today well what a day uh what we have experienced that many sessions uh in our cozy database related corner and we'll close also with no exception uh it's about uh again uh aspects of database internal operations and I'm absolutely glad to invite here on stage the Nils at listov from the abuse please welcome on stage [Music] right and he will he will present on green green plum physical backups and just the stage is yours thank you hi everyone my name is I work as a software engineer at open source manage database department at Neighbors and today I want to talk about how we were how our symbols first in the world to implement physical backups of the green plum database engine with the help of our tool called Volga and I would like to start from some dates as you may know today is 4th of July and it's called like national holiday in United States independence day but I would like to mention two other holidays and first one is March 31st it's called volt backup day second one is October 10th and it's called World mental health day and I think that experience debate already knew how this today's are connected between each other because actually good backups are the key it's a good mental health so uh today our Focus rule will be how we actually achieve mental health good mental health by implementing good backups and what do we exactly mean by a good backup well I personally will Define three criterias first they should be reliable second low cost and third scripture store and what exactly does mean reliable backup well at first of course it should be able to recover like it's a top priority and it should be uh it should have a tool to verify that it's actually able to recover so it should be verifiable uh it should be also encrypted because we don't want others so looking inside our backups data uh it should be well covered by monitoring it should be easy to install uh operate and use and configure and it also should reduce the risk of some human operator error so it should optimize as much as possible some dangerous operations so what do we mean by low cost backup well this is mainly related to the backup creation part and backup creation should not it should be a low profile a slow profile as possible so it should not affect any internal database related routines it should be it should not interfere with user defined load and should be in other words it should be like low efforts to create it should be a slow profile as possible uh also it should be efficient and occupy a low space in cloud storage service and it should be able of course to throttle its resources consumption so we don't want our backups to compete actually with the database engine for resources when they're making a backup so my ex-click from Yandex Andrea boordin once told this race and that's why I decided to include that in presentation as you don't think about good backups and you never knew about the perfect ones so I guess that's basically just you know describes all the of the previous slides points and last criteria good backups should be quick to restore and actually this part is uh completely uh is completely different from the creation phase because uh contrary to the creation phase during this restore we want to utilize as many resources as possible we want to restore as quickly as possible we want to utilize all of the available resources and of course we should be able to tolerate some temporary failures because we don't want uh our six hour long backup restoration to fail just because there is maybe two or three seconds cloud storage uh error or maintenance okay after we discussed what is a good backup let's see what green plum proposes to us out of the box and it actually proposed something it's called GP backup it's a CLI Tool uh well and the main characteristics of this Backup Tool is that it creates a logical backup and most of the disadvantages that are coming are actually inherited from the entire nature of logical backups so at first it locks the dell and it means that for example you can alter the structure of your tables during the backup creation well it's also doesn't leave us any chances of achieving a point in time recovery and even these two points they are actually like stopping a lot of clients of using this GB backup in production so uh it also is not really extensible because it doesn't support many storage engines only it's really boost as a SI SNL and also it's not uh there is no flexible backup policies management available unfortunately so now I guess it's a time to this moment that we all enjoy in a big Tech comparations because we need a reason to invent our own wheel again so let's invent it because and to do that let's look what screen plan actually is well green Palm is actually many postgres instances but it will be an error to say that it's just a bunch of postgres instances it's better represented by the next slides it's many postgres instances running a together I mean in parallel and there is a single coordinator that's manage these segments so it means that to create a backup for green plum we just need to create a bunch of postgres backers and to do that we actually unexpectedly have a great open source database disaster recovery tool that is written in golang and initially supporters only postgres but today there are MySQL is called several Monger radish supporters so and we are working hard on expanding that list so today we are adding cream from uh it's actually the very nature of voltry is that it designed to be extensible it supports many storages it supports many compression and encryption codecs and you can actually build in your own codec or storage engine easily it also have got resources throttling parallelism systems written in Gold pretty initial and there are also flexible backup ports available okay uh we got we've got a two so backup postgres let's go back up green form and let's start from uh the simple postgres cassette approach it's not h8 it's just samples with cluster and as you can see we have a primary and replica and we do we tend to do database journals archiving from primary and backups from replica just to allow the stress on the primary node we do that to the object touch and how actually postgres physical backup Works under the hood well that's quite simple process at the start of the backup we call the postgres function called PJ style backup and what it actually does under the hood it does three things at first it creates a checkpoints well in to describe it like really on the fingers it actually brings the database catalog files on the disk to a consistent state uh then it enables for page rights well basically it's a protection from the sum of the pages parts of the pages being residing in a file system level disk cache so we have to enable that to avoid some turn pages that's how they call it in focus in the backup and the last thing is we record backup start LSN and begin copying uh we begin We Begin uploading the database code log files to S3 and after we finished we just called just the back and record the backup stop LSN well that's that simple that's all you've now got the postgres physical backup but to restore backup you need actually two things you need the database catalog copy that we created on the previous stage and you need a continuous sequence of database journals up starting from the backup start points up to the desired recovery points Ends by combining these two things you receive a result for this cluster well this talk is about Green Farm so let's see how we can utilize that approach to create a group from backup so as you can see on the picture there is a single coordinator it's a simple simple green pump cluster topology so there is a single coordinator instance and four segments and uh actually you can look at this topology just like on a five postgres clasps that we've discussed previously so the process of backing up such system will be the following we'll just repeat the process of backing cap postgres multiple times but we will do that in parallel so just we I as we've done on the previous stage we just called digested by backup start copying files to S3 and when we are finished we just copy just to backup that's easy well the problem is that after we've done all these backups what do we actually want to consider as a backup stop point because when we will be restoring we need some moments in time when class was in a consistent state so we need somehow synchronize all these segments and mass at coordinator nodes and actually this wasn't possible in green plum six but there was a extension called GP Peter that was developed for green Palm 7. and Michael xcalic as I was mentioning before Andrew Borden uh he actually back ported that onto the actual green plum production branch in pump six so now thanks to this extension we can utilize the function to create a restore points and restore points means that there's a it's a cluster-wide consistent points that was created was when there were no running to face commits to face transactions in the entire cluster okay we've now got some tool to create a restore points we have now got a mechanism to create a Green Point backup great we've got database catalog copies and we've got the continuous sequence of database journals up until there created beforehand restore points and by combining these two just like in postgres we receive a restored cluster whoa that was easy basically but the actual question is how do we achieve point in time recovery because yes we can restore on some Single Stop points but unlike postgres there is no option to restore on some sample log LSN or like timestamp and to create some Peter we just basically we'll use the following approach so we'll if if our RPO for example is 15 minutes we just create this restore points each 50 minutes and when we are asked to restore on some Moment In Time we'll pick the closest one that was created before that point ok so I guess that was easy well we now have got a green plant cluster backups that's great but we want to create good good backups so let's get back to our three criterias so first part is reliabilities parts and uh here is how our green plant cluster backup is organized so initially we call a single comment it's called World Jupiter push from coordinator what it actually does it spawns a bunch of sh sessions on each segment and on each segment it just runs a usual postgres backup so I think that's a simple approach but what is the problem the problem is that unlike postgres well postgres actually too but Greenpoint clusters they might require a couple of days to create a full backup because there can be huge in size and since we are running in the clouds environment it's not perfect and there can be some interruptions stage sessions may be terminated so we need some approach to compensate so we simply switch the polling so uh when we are starting our backup we remember all the bits of segments backup processes and then we check them from time to time and see what is the current state and after all the segment backups are finished we consider this backup as successful and contrary if for example some segment backup has failed we consider that as a failure and we're able to gracefully terminate all the running backs and return to starting points for example so leave node traces restoring is quite the same so I would not focus on that use the same scheme with SH so now we can sleep a little bit better because we know that our backups can be created even if there are some disturbances in network and well now we're good to go I guess no because we have part two low cost backups and let's now look at how postgres backup is organized in the cloud storage so as you can see each backup holds its own collection of tar files and these backup files are not connected within each other there is no single backup that share any data with different one and surprise green Palm segment backup is actually the same because we just borrowed uh the entire procedure But Here Comes the problem uh since green plum is uh meant to store a lot of data total backup sizes are unacceptable for example uh if you've got uh 100 terabytes cluster you might not be willing to pay for 700 terabytes C3 Cloud quarter usage so it's uh that's why we are interested in reducing this size so one approach that we can use is called Data duplication and we might use the fact that unlike postgres a vast majority of the data in green plum is actually called Data that's being written really frequently and being only red so we can use it and store this vast majority of common data and share it between different backups so that's a simple solution so when we're making a backup we just check is it and optimized stable file or not and if it is it will go to a common storage if it's not it will go to a single backup search and here's how green plan back layouts of version 2 looks like as you can see there are two backups that are actually showing sounds data oh seconds and well I guess that's good but let's see how much for example on these pictures there is comparison between different schemes first one VR1 is initial and second one V2 is with the duplication so as you can see it's more or less the average number but we mainly notice it and notice it at least 50 percent increase in Cloud quarter storage like decrease and now we are fine I guess we occupy not really much space and we have fall tournaments for Network well we can do better than that and we can actually see some cases when the duplication is not enough because uh this opens up to my segment in common storage it can be for example one gigabytes of size and if you write even like 10 kilobytes to the tail of that file you will have to re-upload the entire open optimized table file again to the storage so it's not efficient because you just store two copies of data that the only difference is that tiny tail and to create an incremental backups we might come to a solution that is borrowed from postgres because bigots increments in voltage for pulse and it works for hip tables but in screen Plum also uses its own database table storage engine it's called open optimized we need to do invent something else so we might look at the very nature of that files and actually the noticeable differences is unlike postgres append optimized files are being written only to the tail so when we are creating a Delta backup we just can reference this previously written backup and uploads only the tiny tiny tail to our cloud storage and we can reuse this approach for subsequent Delta backups okay now we don't have to store these two huge segments we can just create some builders and how much do we save well it's actually depends on your retention period on how much you want to uh how frequently you want to upload the entire data just in case for maybe you want to upload the entire data only one month so you want to save only 30 days of your in your retention and but for most of our clients it's just uh it's at least 65 percent economy in a S3 Cloud quarter storage usage so we've got three nice fancy features are we good to go well we've got the last part and it's called quick to restore and uh well it's actually I'm not trying to compare our tool with GP backup because it's just different backup system because in uh our physical backups it takes eight hours to restore 80 terabytes cluster I don't know how many days will require such Restoration in GP backup but it's will be much much a high number than eight hours and what actually we have left to do a resource totaling is still per segment Italy should be petrol purposed because if you've got some this balanced green green cluster you might have some tiny tiny uh one segment and some really really huge one the second one and result top link will not take that into account and also important feature is that we might want to create we might want to restore only single database or table because if your backup is maybe 80 terabytes of size and you want to restore only table that is like 100 megabytes of size you don't want to download that this entire backup and good news actually it is done maybe like two weeks ago so uh we are actually improving and we strive to create like the best backups for green form and there is also some bonus Parts because if you're if you run uh backups of your green form cluster with valji you can actually use this common append optimized segment storage not only for backup purposes but uh green pump itself can use this a point optimized commands storage industry because you can actually upload this cold data from your database hosts and only download it from history only when there is actual needs of reading hits so it's quite similar to snowflake of course there were some issues of course of course they've encountered some strange errors from Green Palm we fixes some strange bugs initially they didn't support physical backups at all so they said to us well there are no guarantees we are not committing for that and of course there were some bugs but good news we fixed all of the bugs we added tests to Upstream greenplum and the really good news is that after we released our grip on physical backups visibility pivotal actually committed that they now officially supports this physical backups and on top of that recently they released their own physical Backup Tool I'm not going to uh like uh Compares these tools in details well it's obviously has less features but it's a great news that pivotal actually officially supports that way so it won't break in the future and let's switch now to the final part how to use 4G and how to restore your first backup visualj and you do you can do that in only three simple simple steps uh you can just configure having as I said you can just create backup and restore from it but to restore from backup you need a special restore configuration file that actually that describes the a desirable cluster configuration so you can specify uh where you would like to put each segment which data directory which ports which host and after you've created such file you just call logic latest it and it's great you now received a result cluster it's that simple okay and we develop and maintain uh in a GitHub repository it's uh located by this link and to be able to do a physical backups of green Palm you need two requirements voltage should be at least two or one or higher version and Screen one 692 or higher so that's actually it I was trying to be compact because you've got some technical diffusion difficulties on the start so I would appreciate your questions and I guess I'll be grateful to see some of you at after party let's finally make this over well thank you so much for stealing part of my job all right so a round of applause thank you uh well you just stole part of my official job so to say but that means I have less things to say now the part remaining for you is questions so yes I see a hand over there in the center please hi uh how do you achieve consistency between those segments I'm not fully aware of the architecture I must say but uh like if if you have a coordinator and then those segments how do you set the the restore point and how do you make sure that it's the same on all clusters or does it matter or on on the backup or does it matter uh yeah of course it's matter it does matter and we use as a special extension it's called GP Peter and the main responsibility of that extension is to create such Reserve points that are being created from the Cardinal segments and actually what it does under the hood it waits until there are no uh to face commits running in the system into it forbids any new to face commits until this is the point will be created so that's how it's being managed from coordinator after this moments actually appears we just write usual foreign it's being written to the wall itself and after and when we restore the backup each segment replace that wall until that points and then stops that's how we achieve that is there a failure Point uh there or can it fail accordingly well V creates these points each hour and actually we didn't notice any frequent failures in production so there were some fears that they might interfere with user-defined loads but actually to create them there is no they are lightweight to create so on practice there are no conflicts and clients are happy thanks excellent night any other questions yes and a few rows in front ah thank you for your talk and I also have a couple of questions about backup Appliance scenario so if I understand it correctly uh this uh shards or segments segments are independent somehow independent yeah they have different data run on different machines and suppose uh some disaster happens on one segment and you have to apply it back up but other segments and other data are safe so uh do you anyway roll back to the latest checkpoint and lose all the data even on the segments that were not corrupted didn't suffer somehow well no we just use this physical archives so rest up the file segment and then it can replay the valve up until the desired point and it's also okay because well green plan won't commit any new changes if you have some failed segments so the failed segment will just restore from backup replay all the walls if they have been hyped correctly and this will basically make sure that it's in sync uh so it tries to restore the failed segment to actually the last Point not to the checkpoint but yeah not not to the restore point but just uh up until the end of fall okay and if the disaster is uh such severe that you lose your right head log yes then you're in trouble and you have to restore the entire cluster up to the restore point because there is no radio in postgres or green plants whatsoever so yeah then you are out of like unfortunately yeah and a little bit uh the next question so suppose you lose your right ahead log on one segment uh so you are not actually able to restore to even a checkpoint because uh if I understand it properly a checkpoint still uses a part of a right headlock after the backup uh so how you do you actually restore your cluster if some segment doesn't have right ahead logs the whole data burned in the fire and actually I'll just interrupt can you answer that in seven seconds that you have remaining actually there it is the same problem that uh postgres has if you have a backup and you want to restore in some points you you you're obliged to have all these walls in your cloud storage to restore correctly so it's the same right and uh please continue with this we're really running out of time so this is an excellent opportunity to continue that discussion in the hallway and before and before the reception the social event right so the last topic last task for you remaining who gets the price well I guess as the last person was really intrusive and uh I mean like he asks really interesting questions and so it's like gentlemen on the third row all right thank you if you could raise your hand just to Four Points yes over there so with this round of applause and this is the end of the last last session right so thank you thank you Daniel for presenting this now um announcement uh we will have the official closing of this conference in hole one that's across the lobby on the other side and that is going to happen in like five minutes so if if you are interested certainly please uh well it's not that you're interested you must be interested in that right so uh the official closing ceremony will be there and then afterwards we have a social event a reception and a free time for interaction and networking uh discussions all the follow-up questions that you didn't have an opportunity to ask for speakers and just while mingling with your fellow colleagues thank you so much thank you foreign
Info
Channel: Tech Internals Conf
Views: 92
Rating: undefined out of 5
Keywords: HighLoad++
Id: SRA-C2KNs0I
Channel Id: undefined
Length: 35min 20sec (2120 seconds)
Published: Fri Mar 15 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.