Efficiently backing up terabytes of data with pgBackRest. David Steele (Crunchy Data)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so my name is David Steele as you probably know now if you speak Russian and I am the wait a minute hang on sorry I looked over and it wasn't my laptop what happened wait a minute I seem to be missing the slide anyway I'm the principal architect at crunchy data we're based in the United States and we do most of our stuff there although we have so we're reaching our tendrils into Europe a little bit as well so today we're gonna be talking about PG backrest which is a tool for dealing with unruly backups correctly and efficiently but dealing with very large volumes of data now I've been giving this talk for almost three years now since before backrest was you know when it was an early beta you might even call it alpha even though there were companies running it in production because they had no choice it was the only thing that was big enough to do what they needed but what I found the last time I gave this talk is the feature set for backrest has now gotten so large it used to be review the feature set and then a demonstration and then I outgrew the demo you're much better off just going to the user guide on the website there are tons of examples there and I found that in my last talk I've actually even outgrown talking about the entire feature set there simply isn't enough time in a talk like this to do it any justice so it aleeah's suggestion I'm gonna take a slightly different format this time I'm gonna go through some of the slides kind of quickly and then I'm gonna spend more time on the slides or features that I find are particularly interesting and give you a little more insight into architectural decisions why we did things the way we did you know what we're trying to accomplish how I might help you so if I go past something really quickly that you're interested in just let me know I'll go back or I can say things so well partly what I'm saying is if you want to ask questions during the talk that's fine I want this to be a little bit more of an open dialogue than then I've done this talk in the past so let's get started here's our agenda we're going to you know secondly see how there we go we're basically gonna talk about really quickly why you should back up I don't think this should be a big sell but we're gonna go through some of the reasons and benefits like things you can actually get from it it shouldn't have to be it sure we'll talk about this concept of living backups really briefly and then we'll go into design features performance will also talk about changes we've made decor to make backup work better things that are in the pipeline so that's our basic agenda all right so back up we're gonna go through this pretty quickly but obviously you've got the you've got the obvious things you've got hardware failures you've got wall archive to keep your replicas fed you know I may not you may not always be able to get the wall from the master so the wall archive can be a great place to get it you can build your replicas from your backups instead of from your master reducing load on your master you can detect corruption and fix it right right now the only way to know that you have corruption is to maybe try to PG dump your entire database or get lucky if the corruptions and the index you may not find it back rest will detect page level corruption for you and let you know when it happens so you had the choice of going back to an older backup and restoring so you might have had an accident maybe someone just accidentally dropped a table or deleted your most important customer account or that sort of thing development you know there's no more realistic data than production so if there are you know like government or privacy reasons why you can't do it you know using your production database for development can be really beneficial also reporting so rather than doing reporting off of your primary you can stand up a reporting server especially if your reporting is daily and you can you know sync up your server or do a restore and then disconnect it you can use temp tables you can do all that good stuff and you can do your reporting so there's lots of ways that you can use your backups not just for disaster recovery and in fact well sorry I forgot about the slide so this is extremely important if you are not restoring your backups and you're not thinking about your backups in terms of full life cycle they probably will not work when you need them not only may there be something wrong with the backup or you discover that the cron job was disabled six months ago by you know your newest employee you know no one's gonna be familiar with the process so you need to be familiar with the process and so in order to do that you need to find a way to make backups useful you need to get them into your enterprise in one of those ways that we discussed and you need to remember this this very important thing that if you're if you've got code paths or procedural paths that aren't being followed regularly they're almost certainly out-of-date and don't work and you do really don't want to be the first time someone's doing a restore to be a mission-critical time when your primary server has gone down you want the people in your admin group to be familiar with doing backups and restores with with bringing up replicas and this should be you know second nature to them so by the time something bad happens you can deal with it quickly and effectively and with confidence I mean a lot of times when things go wrong it's because people aren't confident they get nervous they make mistakes they get more nervous they compound those mistakes so the more confident people are the more likely you are to get a good restore and recover from death disaster recovery quickly so that's pretty much the end of my lecture on why you should be backing up there seems to be a pretty good crowd here so maybe I don't have to convince you but think about integrating backups in your enterprise and how you can do that and how you can make them living backups like full lifecycle not just for the the chance that your master goes down which frankly with Postgres doesn't happen that often you know most people are able to recover their masters so that's not your general use case it's your worst case use case right so let's talk about backrest design OOP so why was backrest started backrest was started because the primary tool that's used for post gross backup is our sink so whichever tool you're using you might be using something like Bar Man or something you know homegrown but most backup solutions are built on top of our sink our sink has a number of limitations the biggest one is that is single-process right this is a problem for very large databases it's just not fast enough it also has a problem that it only has a one second time stamp resolution so if our sink is copying something and Postgres is modifying something at the same time you're not differential backup won't pick up that change so our sink is not safe for differential backups in unless you use checksums if you turn on checksums it's fine but if you're using time Spain time stamp based differential that's not safe the other thing is that the way incremental backups are done with our sink it requires the previous backup to be uncompressed if you have so now you've got at least two backups that are uncompressed ignore backup set if your database is very large 15 20 25 30 terabytes this is an extraordinary burden because database data compresses fairly well you know I regularly see 75 to 80 percent compression so it's just unacceptable to have uncompressed backups laying around so the idea with backrest was to and don't get me wrong I've been working with back post Co since 1999 and I've been using rsync except for the last three years it's seductive it's easy it gives you most of the functionality you need and when databases were smaller it worked so the idea with backrest was to go back to fundamentals come up with a backup solution that was designed for Postgres for databases you know our sink was not designed for databases it was designed for moving archives around the internet right and creating mirrors a database does not work that way they're not the same problem set so essentially we decided to get rid of all the typical backup tools tar our sink all of it start from scratch write our own stuff so and also to natively support local and remote operation you know which we felt was extremely important and also solved this time stamp resolution issue which actually turns out to be really easy we'll talk about that later so at the core of it you can kind of read through these bullet points but the the core of back rest has always been about doing things in a multi-process way right take the problem the great thing is copying files is a is an excellent place to do multi processing you can easily generate a manifest that's very quick right and then you can divvy those follows up between a number of processes we actually do it dynamic so that no process gets you know too much data so the processes are fed data as they go so it's nice and even across the board you start with the biggest files you work your way down to the smallest files and you know tablespaces can be split among different processes to optimize i/o etc etc but everything we do in backrests we're thinking about multi processing you know how can we paralyze this problem and make it more efficient even not everything in backrest is parallel we'll talk about some of that but we you know everything can be paralyzed everything is designed in that direction in some cases we just haven't inserted it yet so this is extremely important that you know we support local remote operations so you might be backing up to NFS that we would consider that to be local operation some cases people actually backup locally and then have a backup server pull those backups off and move them away you might have your own backup server running a back rest process where you know centralize all of your backup scheduling you might be backing up to s3 so there's lots of different options to do local remote combinations of those you can actually have a backup server which is then moving stuff to s3 so you can centralize your configuration and your cron jobs and things like that but is this extremely flexible there's lots of different ways you can configure it you get your backups in a place they need to be for maximum safety so we do full incremental in differential backups and anyone here know what a differential backup is okay so you know I didn't really know what a differential backup was before I started this project but a differential is an incremental that's always based off the last full backup right so if you want to reproduce you know a a backup from a differential you only need two backups the differential and the last full whereas incrementals can be based off of each other and you can have long chains of incrementals they can be efficient but they're also harder to expire because there are so many dependencies you have to expire entire chains of things before you can get rid of them and as I said earlier backrest is not susceptible to the time resolution issue it turns out the solution to this problem is very simple you build the manifest which is the list of all the files that you're going to backup and then you wait till the end of the current second to start copying so that's it so you just wait that time period and so if Postgres modifies the files during that time period that's fine if it modifies them the next second they'll get a new timestamp and then you start copying it's that simple so there's no real magic to it but again our sync was not designed for backing up databases and it doesn't do this so easy solutions so built-in to backrest is is back up rotation and archived expiration so you can set retention limits for your backups you can associate you can decide how much wall you're going to store you can actually store wall for all your backups or you can store wall for just some of the most recent backups now if you do that the only thing that back restful expires the wall in between the backups because the wall that was actually collected during the backup is required to make your backup consistent so whatever that set is will be kept and then you'll start to see these gaps appear so what that means is you can still recover a consistent backup but you may not be able to do point in time recovery to a certain point from those older backups I encourage people to keep all their wall if they can if they have space backup integrity all right this is something we have spent a lot of time thinking about so earlier I said if you're not restoring your backups they're probably not any good and and philosophically that is definitely true but we've done a lot to try to make sure that if backrest says you have a backup that you really do have a backup some of the ways we do that one thing is we actually validate all the Postgres page check sums right so this is only good on 93 and above and it's only good if you've an ale enabled them in at an it DB time so obviously if you have an older database that you've been PG upgrading you're not going to have check Suns checksums are also not on by default I encourage you if you're doing it in it DB always put - K the overhead is very small and the payoffs are very big in terms of knowing if you've got some bit rot pages that drop some sand problems some controller problem this will let you know now back rest does not error out when this happens it just starts putting warnings in the log and the backup completes normally but at that point if you want to you can go to an older backup and and recover because maybe the error was only on disk and maybe the error is not in your wall stream so if you recover and older backup and replay you may be able to eliminate this corruption but the important thing is at least you know what happened because maybe there's nothing you can do about that one particular block but at least you know hey there's a problem my controller is bad my discs bad something's bad if I'm getting bad blocks popping up and you can deal with that problem early instead of two years down the road when you've got bad blocks peppered through your entire database we calculate check sums for every file in the database sha-1 check sums so and these are rechecked on a restore so if the check sums don't match the restore won't work right and you'll be notified so if let's say after the backup is complete there's some corruption in the repo this will help detect that it also checks after the backup to make sure that all the wall segments made it to the repo right so if you don't have those wall segments you don't have a backup this is very important so we verify that those wall segments are there for you you can turn it off there are some times when you're doing your own archiving or some other thing so we allow you to turn it off but the default is to check it we also use a really simple backup format so inside the back rest repository the clusters are stored just as if they were a post ghost cluster and this includes even creating the tablespace links internally so if you're say using something like ZFS you can snapshot the back rest repository bring up the database in place now you still have to do all replay so you know it may take a little bit of time but you don't have to make a copy of the database in order to bring the database up for very very large databases this can be very advantageous because they can be compressed and you know with ZFS and then you can actually bring them up in place and you know dump out development data pull that table that that person deleted or the account they deleted and you don't have to bring up the entire system so this is another way where we really think about terabytes scale databases how do you work with them effectively how do you work with them efficiently and all operations utilize file in directly directory level F sync to ensure durability the interesting thing is I was working on this at exactly the same time that in Postgres we were cleaning up a lot of areas where we realized that directory of sinks were not being done correctly and I was talking to Andres and others at that same time in making sure these same protections got into the backup tool because you don't want the backup tool to tell you that the wall archive has been created and then it's not right this is extremely bad so you want that to be correct resume so I'm just gonna gloss over this but very quickly if you have a backup that fails backrests can use that backup you know the next time you you schedule a backup it'll reuse as much of that backup as I can now it still has to do check sums but it reduces the load on your master on the next backup run so it can be very handy this is sort of a philosophical thing in in backrest everything is done in stream okay so nothing has ever done at rest so if you're pulling the the file from your database it lives in memory until it sits on disk in a compressed format at the end so and checks arms aren't done on files at rest etc etc everything is done in stream compression isn't done more than once so if your transport protocol is ssh for instance we disable compression in an ssh compression is done on the database server and then that file is stored on the destination this happens one time when the data comes back during and restore that original file is transferred over and it's decompressed on the database server so files are touched as little as possible operations are done as few times as possible this is for performance primarily Delta restore okay so this is a very interesting feature so I always tell people backup as slowly as you can right you don't need to backup quickly you don't want to put that load on your system you don't want to put any more load on your system than you need to to do your backups so if your weekly full backups are happening within a day or even two days that's okay that's what the wall stream is for your daily differentials or incrementals if they're happening in a couple of hours that's fine don't deport any more resources to backup than you need to however the thing that you do want to be really fast is restore right especially if this is a DR situation so your primary server has gone down you're building a new primary or your you know building something hopefully you can just failover to a replica right this is your usual situation these days but in the case that you need to do a restore parallel to restore is a really great tool so what this does is it compares the manifest against the disk it deletes anything off the disk that isn't in the manifest right that's the first step that's nice and easy and then all of the files that are the same between the manifest and the local system it starts doing check sums on them in parallel so you can define your degree of parallelism here it works pretty well because of course sequential i/o and you're using a certain amount of CPU for the checksumming so I've done this up to 64 ways on big systems and it will very quickly determine what it needs to pull from the backup and as it goes every time to find a checksum that doesn't match it'll pull that file from the backup and then it continues on so you can see you can actually have a 10 15 20 terabyte database that will restore in a few minutes you know it depends on your Delta if you're going through and changing every segment of every table every minute well your Delta's gonna be very big and it's gonna take some time in theory the Delta method could take longer than a clean restore if everything had changed but in practice that's not the way it works in practice you know large parts of the database will remain static especially if they're data warehouses you know multi terabyte databases tend to be partitioned and they tend to have kind of a data warehouse mentality so you can get a lot of efficiency here so so this has been used to great effect for you know too many people's this is another area where we're thinking about how do you deal with big data you know what are the effective and efficient ways to get things backed up and restored and moved around as quickly as possible another really you know it's interesting because when I started this project like like a lot of you who here has written their own archive command show hands anyone okay a couple people you know you when you start with an archive command it looks pretty easy you're like okay well Postgres calls this and then I just moved this file over here it can't be that hard but as you get into it and especially with a piece of software like this where many people are using it you start to find all the edge cases what happens if the file gets written properly but on the way back over the network to notify Postgres that packet is lost well post Gus will try to repost that file because it doesn't know that it succeeded so now you've got to determine is the file I have already the right file should I just keep what I have or should I give the an error because now I've got a wall file with the same name but a different checksum etc so you want post Gustin silently succeed if it was just reaping the same file you want a failure if they're pushing from another database or a database on a different timeline or you know some other horrible thing that's happened this indicating an air people love to copy wall segments around kind of randomly and this causes all kinds of interesting problems for archive functions so so what we've done is we put in a whole bunch of OOP it's not what I meant to do we put in a whole bunch of checks to make this thing work so first of all we have dedicated commands for pushing wall and retrieving wall rather than saying hey use our sink or use CP or do whatever it's all built in and it's all part of the back rest system the push command like I said I'm at eclis detects when wall segments are push multiple times and D duplicates you know otherwise it gives an error both of the commands will check to make sure that the database and repository match so when you create a backrest stanza in the repo it pulls critical information on the database the system identifiers are the version number and it stores that in the repo so it knows if the correct database is talking to it it's really common for people to accidentally let's say you're setting up a new backrest repo or a new server or they'll copy the config file and then forget to rename you know create a new stanza or rename if you do that under the system you'll just get rejected wall it won't allow you to put the wall for one database into another databases repository extremely important and then and then the big thing here is you know since the beginning of the year we've supported parallel archiving so this is actually done asynchronously but from the standpoint of Postgres it looks synchronous in the way it works is anytime post guests require a request say you know calls the archived command and requests that will all be pushed back rest will spin up a background process and it will it will peek inside the archived status directory in Postgres and see if anything else is ready to be archived Postgres creates these little dot ready files on disk to determine which things are ready to be archived and so we can actually use that to look ahead to things that post ghost hasn't requested yet and go ahead and spin up a process now meantime the original process that got that fired off the async process will return immediately you know if it finds an indicator that says that that wall was archived it'll just return so Postgres can keep calling the archived command in the async process and the background will keep working and the two work together to you know get your wall copied off as expeditiously as possible so you might be asking and this is a pretty common question do you support PG receive X log and currently the answer is no the reason for that and we are planning to although we're going to interface directly with the protocol layer rather than using the the PG receive X log binary itself the reason we don't is because there are a lot of basically performance limitations built into that model and only in recent versions of Postgres if something's been built in that are gonna allow us to sidestep those issues and it may even be version 11 before I can really get the thing working to my satisfaction this method that we have here is extremely fast so even very high volume databases can use this method compression is parallel you know all the protocol stuff is in parallel so you can really take enormous advantages if you're pushing off to s3 you know that could be done in parallel because s3 per per stream can be really slow just creating an empty file an s3 takes about one second which is pretty slow so but you can paralyze that and get some real performance benefits from that but currently no PG receive X log although we are working on that I'm gonna kind of gloss over this but tablespaces and links are fully supported so if you've got a PG X log off linked off or you've got you know stat PG stat linked off or whatever you can backup and restore databases basically with any kind of topology alright so here's an interesting feature and this was something that was frequently requested so the the question I've gotten many times over the years is can I restore a single database out of a cluster and my answer was always no you know not in a binary backup you could do analogical backup like peach but you can't do it from a binary back up but Stephen frost and I sat down and figured out a way to do it and essentially what we do is we we restore all the files for the databases that you're interested in we do this by storing the catalog of databases when the you know the original backup is done so you know where it's going to be stored on disk and then everything else is restored as a sparse 0 file so initially all those files take up 0 bytes although they are gonna take up some I know directory space of course but and then you you replay the wall now the interesting thing is if the wall makes modifications to the other databases that you're not interested in those will be replayed into those files and those files will start to grow so the databases that you didn't restore will grow by the amount of wall that's contained in the you know the wall that you're replaying could be a lot or it could be a little it kind of depends on your particular use case yes well the the database the backup process itself is going to force full page writes during the backup so it's a good question let me repeat it you said do you need to turn on full page writes now full page write serve turned on by default in Postgres but you can turn them off because there are some file systems like ZFS that claim to deal with this problem for you and you don't have to I'm a little skeptical sometimes because I haven't seen proof of it but you'd have to prove a negative you'd have to prove that it fails and that that can be very difficult to prove but even if you have full page rights turned off Postgres will enable them during the backup they have to be on so that the backup will succeed because at this point you don't know where this data is going you know you don't know if you can trust that filesystem to preserve things so good question but no you don't have to worry about it it's built in and then when the wall replay is done you'll have your original the databases that you wanted will be fully up to date and the databases that you didn't want will be in some kind of strange state where they'll be mostly empty but they might have some update or insert records from the wall I know this is a little bit weird but you can't connect to them I've made sure of that so if you try to connect to one of those databases you'll just get an error and it won't allow you to connect so the only thing you can do is drop them it's the only command that's allowed so a little hacky yeah you know we've thought about at some point maybe getting back rest to in a second we've thought about perhaps at some point getting back rest to you go back after the you know the recovery and clean stuff up but it's actually a lot harder than it sounds because there's really two parts of a database restore there's the actual restorer which backrest does right so it pulls all the data from the the repo that's required it writes the recovery kampf it sets up the archive compare that restore command rather and then back row stops and that's the end of back rest responsibility and then the Postgres starts up and it actually starts requesting mall segments from the repository and plays up to the point in time that you're interested in recovering - and you know backrest is not involved at that point at all so we would actually have to kind of figure out how to re-inject backrest into the process for now certainly for the people who are interested in this feature this is fine with them you know they're not concerned about the they're generally not concerned about these other databases so they're happy to drop them or just leave them sometimes it's just that they want some data and they don't want have to recover everything some people run thousands of customers inside a single cluster and they don't want to have to do that restore every time back up from standby oh yeah go ahead yeah okay so yes in theory the reason why we haven't done that yet is on disk it's very easy to tell where databases live it's a lot harder to tell where tables live so in order to do that you need to start poking into the PG final node map and because the alot of the the IDs you know on disk aren't you you can't just know them from the table name you've got to query the database and then go there there are some problems here but we're actually looking at table level restore as well and it would work under the same philosophy but we'd have to store a lot more information so that we could identify that particular table plus all of its Forks so you've got the table and then the free space map and the VM map and that sounds pretty easy but then you've got what about toast so those will actually have different IDs so you need to be able to pull in all the toast tables as well it actually gets a little bit complicated and for most people a database level restore was enough so we've stopped for now to move on to other important features but uh but it's a good point and yes it is possible just more complexity of course as always all right back up from standby this is a very important feature to many people and Postgres has a backup from standby facility so you can start a backup on the replicas and do your backup from there we did not use it we actually came up with our own backup from standby method I spent several months researching the new method so essentially what we do is we actually create a backup that looks just like it was taken on the master so just like a normal master backup we start the backup on the master we wait until the replay replay location on the standby reaches that location at the start it back up and then we copy all the replicated files from the standby and we copy everything else from the master so essentially what you get is a backup that looks exactly like it was taken on a master but actually is taken from the standby instead most of it right so some things are copied from the master mostly it's small but lists for instance if you're restoring your Postgres logs on your master it's gonna copy those as well I always encourage people to store their logs actually in a separate directory not in the post cast data directory one of the reasons why we did this too we went to all this trouble is this allows us now we haven't coded this yet like I said our idea is we're always thinking about parallelism even if we can't implement it yet right so the reason why we came up with this new method first of all we think he gives you a better backup the other thing is just going to allow us in the future to do to backup from multiple standbys so now we can use all your available standbys to do the backup and and spread that load out as much as possible and when you combine that with Delta restore you know highly performant ways to do both sides you can see how this can give some real advantages now so far we have not done that so you know sometimes people ask me why we haven't implemented things that look really easy to implement and my answer is always testing we have a very strong testing ethic on the back rest project so essentially we don't code anything that we haven't written test for and so even though this is would be relatively easy to code now we've got to go test it and ensure its correctness and testing is often the bottleneck you know we probably spend you know if for any given feature we've spent about 80% of our time writing verifying reviewing test code you know not the actual code the back rest code base is relatively small compared to the test code base so you know so a lot of times if something's not implemented is because we haven't had time to write the tests yet because if I can't verify it's correct I don't want to push it out there so s3 support anyone here interested in s3 support a couple hands good this is a really often requested feature this was another thing that part of the delay was because the client libraries that exist don't really subscribe to the back rest philosophy of not ever doing anything at rest doing everything block wise and stream etc so basically I had to write my own interface to s3 now I used an HTTP client right so I didn't like start from scratch but we implemented the s3 protocol ourselves so that we can make sure that it was as fast and efficient as every other part of backrest we abstracted the storage layer so whether you're writing to POSIX or there's also a Sif's option which just disables you know certain F sinks and directory links and stuff like that that don't work on Microsoft or s3 you get all the same features because backrests talks to this abstraction layer and as far as its concerned it doesn't care what's below that the data could be locally it could be away you know backrests thinks that s3 is local because it all it sees is the storage driver and it tells it go store these things or retrieve these things for me and it's very efficient here's another area where parallelism really pays off because s3 itself is quite slow to create new objects so by paralyzing we can but you know there are millions of thousands or hundreds of thousands of s3 servers so by paralyzing we can spread that load across three servers get a lot of performance benefits and we support version 8.3 and above and if you think no one's using 8.3 you'd be wrong there definitely are people using 8.3 in fact we we were working with someone last week who is running 8.3 with tablespaces even so and it's a two and a half terabyte database so it's not even a trivial installation pretty interesting so anyone here running 8.3 8.4 okay 909 1 ok so so those are obviously the the databases I named are all end-of-life right the post cost project does not support them anymore but we still do older databases still need to be backed up you know they're they're people too they need some love so we try to give them some love let's talk about performance a little bit this is kind of an older slide I actually have made some improvements but the general numbers have stayed the same essentially what I was trying to do here was compare since our sink doesn't do compression I was trying to compare a backrest to our sink if you are synched to the files and then compress them afterwards and you can see that you know even even with a single even with one thread all right so this is sorry this is so with a single thread and with no compression like no destination compression which means you're just doing compression on the network our sink is faster our sink is a more efficient implementation our sink also is not doing sha-1 checksums you know which cost but you can see as you start to paralyze things get faster really quick and then they just keep multiplying postgis our sorry backrest uses separate processes for you know compression we used to use threads threads were actually fairly efficient for compression but the actual signaling you know the messaging to actually send messages around was extremely inefficient in perl threads so instead we moved a process we're do using RPC for everything and so you could basically it's the scaling now is essentially linear if your disk can supply data you can run as many cores as you want the most recorded to this point is our 80 core backups and this is for an installation that's so large that is so busy that they can't do replication because post-crisis replication won't keep up so what they do is they every four hours they do a complete backup of their master and a complete restore a delta restore to their two replicas and and that's how the system works because they simply the single threaded replication and you know replay on post go simply will not keep up so it's an extreme situation but it gives you an idea of how scalable this is if you have the resources now what you'll find in in reality is you can clog up your your network very quickly if you're unfortunate enough to be working on a one gig network then you can only run four to eight course after that you're simply gonna call your network you might find that if you're running on good old-fashioned spinny discs which many people are still are IO will be about neck you know these days systems tend to be pretty core heavy and backrest works with cords very efficiently so you can easily have more cores that you can accommodate in either IO disc IO or Network IO but if you find the right balance you can backup extremely quickly although as I said earlier don't back up any more quickly than you have to you know don't put that strain on your system if you can avoid it so in addition to obviously working on backrest I've also been thinking about how can we make Postgres better with regard to backup so some of the things that have been contributed recently that I've either written or contributed to or excluding certain files and directories so there are actually a number of files and directories into post cast directory that are rebuilt on restart or can be rebuilt on restart so last year I went through audited the code identified all those work with the people who had originally written those subsystems to verify that yes in fact they're not required and I not only put that in to backrest but I contributed to post graphs as well so now in ten dot OPG based backup will also exclude you know those files and directories that are rebuilt on restart like why even copy them something also put into 10.0 is making PG stop backup weight optional this is a little arcane but basically Postgres also tries to wait for all the wall to archive but since backrest does it itself this is a switch to tell Postgres when the backup stops just return and all take care of making sure that all the wall makes it to the archives successfully nonexclusive backups so this is actually written by Magnus but I reviewed and tested it to make sure it works this is a great feature it allows you to run multiple backups at the same time so you can run backrest and a banks backup or to base backups for you know whatever it used to be bass backup used this API but it was internal only external tools could not use it now they can archive timeout fix this was written by Michael Pacquiao and reviewed by me and and several of the people but anyone familiar with the archived timeout feature basically what it does it allows you to move an archive a wall segment to the archive on a schedule so if your database is really quiet at certain times a day you can have one wall segment that sticks around for hours or even days and there's important information sitting there so what you'd like to know is that eventually you know some some time interval it's going to get out to the repo this has been broken for a number of versions of Postgres that used to work now because of the messages that get written in for replicas it looks like activity to Postgres in post Gus is just constantly writing these things out even if there's been no nothing to write so it was very wasteful so that's been fixed we're planning more exclusions I have a patch which I submitted for 10 you just ended it being too complicated there were too many changes to allow a group read on PG data all right now this is forbidden but for security purposes it would be nice to be able to do backups as not Postgres you know like someone in the Postgres group who has read-only privileges it would make it a lot safer if God forbid there any bugs back rest it would protect you from those bugs I'm looking at a way to pass multiple of all segments to the archive command - right now the biggest bottleneck on archiving from using that method is the number of times the Postgres has to call the archive command and the startup cost of that command you know limits the total number of wall segments that you can push and we're also looking at configure a wall segment size this patch is being written by bina Emerson unfortunately I looked at it late in the ten cycle and found it enough issues that we weren't able to commit it in 10.0 but i am dedicated to getting this an 11 this will allow you to have larger wall segments for busier databases which reduces the number of times you have to call archive command etc the number of fetches you have to do when you're doing recovery all of it and these are things we have in the pipeline for backrest obviously post goes 10 support you know for those of you who haven't followed along even if you seen Magnus Andrew a Magnus talk about post goes 10 what he doesn't mention is we changed and renamed a bunch of stuff so PGX log is now PG wall all the functions and columns that were X log are now wall so that's gonna be kind of a pain and it was a pain for me but I supported it because I knew it was absolutely the right thing to do in terms of moving post cost forward so from my perspective as a tool author I'm happy to make changes when they're for the greater good these were for the greater good I will tell you if you have any homegrown backup solutions post post 10 is basically going to break them this might be a good time to think about you know an externally supported tool we're looking at encryption this will be a compliment to the s3 service s3 does its own encryption of course but the keys are still that just turned back on the keys are still stored with us with Amazon so it's not perfect so we're gonna be building the encryption we're also gonna be looking at a zi standard compression right now we only support gzip if you haven't played with zi standard which just got to Wando last year it is extraordinary extraordinarily fast is what it is so you can get the same level of compression as you get would use it five times faster or if you spend the same amount of time you can get 30% more compression pretty cool and and the last feature that we're looking at very soon as parallel archive get this hasn't been a big priority because generally speaking macros can keep up with post cross replay because replay is single threaded so getting archive single thread it hasn't been that big a deal but there are some edge cases where replay his outstripp'd archive yet so we're gonna look at pulling batches of wall down to get that process as fast as possible and that's all I got and I think I'm almost out of time but if there any questions we probably have a few minutes all right well thank you very much oh sorry there is one question when they talk about exclusions of some files and directories from does it mean exclusion of no logging tables no we the problem is it can be actually very difficult to identify which tables are in that mode so I I did it by identifying the files as having the same named files to use in each suffix right so there there there are so here's what we're doing and we we were working on a patch for this for Postgres 10 and we weren't able to get it done in time what we're trying to do is move around some of the things in Postgres so temporary files are easier to identify because even Postgres if you actually look at the recovery code where it's identifying which files are temp it's actually doing sorry her unlogged is doing it by recognizing the file names which to me is a little bit scary so using exactly the same method you used and maybe you looked at the code but so what we're trying to do is work on moving those tables to a different directory - like a PG temp directory underneath and that way you can just exclude that entire directory because actually right now the startup since we introduced log unlock tables the startup cost of post crust has actually gotten a lot larger because you've got to scan everything in the database you know all the files and check things so yes we were working on that we were working on a patch and we were just not able to get it in 410 it ended up being a bigger change than we expected and we couldn't drop it on March 1st thank you very much but yeah that's a great one and we're working on that can they still be used for archiving some part of interest or in some part of a large table maybe partitioning table so we have a partitioning table and one month for example and another three years this table we can place can we use archiving parts and restore some of them some of that so actually this gentleman asked that question can you restore a single table out of a database I may not have repeated it and I apologize can you restore a single table out of a database and the current answer is no but we are working on it so it is planned but currently no you have to restore the entire database currently out of the cluster and you can't do a single table but it is it is a planned item for the future okay thank you all right well I think that really is all our time so they thank you very much [Applause]

Info

Channel: PG Day Russia

Views: 1,240

Rating: undefined out of 5

Keywords: PostgreSQL, pgBackRest, database, dba

Id: djvc_4ONTcQ

Channel Id: undefined

Length: 50min 35sec (3035 seconds)

Published: Thu Jun 25 2020