Getting the most out of Proxmox Backup Server: Backing up other data, Offsite syncs, and more

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Today I'm going to be going over some of the advanced features of Proxmox Backup Server or PBS. Some of these features include backing up non-Proxmox data like any path you could want in a Linux system, syncing to offsite servers, encrypting your backups, and more. I'm also going to do a few performance benchmarks to see what type of hardware changes make what types of performance differences just to see how you can optimize the speed of your backups. This video builds upon my previous Proxmox Backup Server or PBS video, so if you haven't seen that PBS video or used PBS yourself personally, I'd take a look at that video just again and get a feel for what's going on PBS and learn some of the basics before we go into the more advanced topics. I'm also going to cover quite a few different topics, so I'm going to use chapters below to make it so you can easily skip to what you find the most interesting or useful for your use case. I'm first going to start taking a look at users and permissions. While this isn't the most exciting topic, I think it's good to start here because these accounts I'm going to be creating are going to be used in other cases later on and I want to make sense where this came from. Setting up accounts for doing backups that have minimal access is important because if you give those accounts all the full access or just use the root account for everything, if somehow that account was compromised, maybe for example an attacker could take over your Proxmox hosts and delete all your backups, which could make something like ransomware much worse than that example. So it's best to limit the access that all the accounts have and only use the very high level accounts like the root account whenever it is actually needed to use that account and not for just everything, for example. So what I'm going to do right now is I'm going to just add an account under user management under access control on this left part. So I'm going to add it and I'm just going to call it backup user in this example and I'm going to set a sample password for this use case. Under Realm it's going to use Proxmox backup authentication server by default. I can set up LDAP Realm, so other things if I'd like in this realms tab, but I'm not going to do that here. I'm going to click add and one thing to note is the Realm here is going to be PBS. So if you're adding it somewhere, you're likely going to have to use backup user at PBS as the username. If I take a look at the permissions of this account, it has nothing. And when I log in as this other user by default or to use the PAM Linux PAM, I want to use this Proxmox backup authentication server or PBS and I'm going to log in with that and it doesn't have any permissions. So I'm not going to be able to see anything. It shows things, but it says I don't see any data stores or anything, even though this is the exact same system. So what I'm going to want to do is I'm going to give it the minimal permissions it needs. So under permissions right now, I'm going to go add user permissions and then I'm going to say the path that I want to have access to. This account, for example, is going to be just doing some backups to this data store that I have right here. So under slash data stores slash the name of it, which is boge right here, I'm going to say I'm going to give the user, which is backup user, and then I'm going to give it the data store backup. And this gives you the minimal permissions you need to back up to it. It gives you the ability to make a new backup right to the backups and read to the backups. So I'm going to click add right now and those permissions are added to the user. So if I go back on the user management, look at this user, click show permissions, it has access to this data store right now. And then if I switch to my other tab, refresh this page. And what I'm going to see right now is I'm going to see under data store, I can see this data store right now and I can see any content that this data store might have on it, which is not another tab to look at is two factor authentication. If you're using something like the web interface that supports two factor authentication, I highly suggest you use something like TOTP, scan your phone with it, use an app like Google Authenticator. So that way it's much more secure when using the interface or compromised password does not compromise your whole account. The next thing to take a look at is API tokens. For every user, you can create API tokens and these API tokens are able to be used primarily in something like the Proxmox backup client in the command prompt. And the advantage of doing this is it isn't using your actual password. So that way you can easily disable an API token and give them specific permissions. So it's kind of like a sub user you can kind of think of with its own thing. So I'm going to have a user that has an API token. So for example, backup user at PBS, I'm going to call it token name. So maybe I'm going to call it like server one for this example. And then I'm going to click add. I'm going to get a secret that I'm going to copy and paste on use on that server only. This is going to be my username. So I'm now going to include this token information. I'm going to show a little bit more on how to use that token later on when I cover using that little Proxmox backup client utility. It won't show this ever again. So make sure to copy it now and you can use it. But assuming I did copy it, I now can add permission for that API token. So that way I can give it access to the data store and just the backup access it needs, for example, and click add. Or maybe if it was doing something else, I could have given it slightly different permissions for just that token. And then, for example, let's say that server got compromised. I could just go and say remove. And now that service access is gone. And I don't have to worry about someone who got access to that server or the files on it has access to anything else in my backups anymore because their token is now revoked. Now I want to take a look at namespaces. Namespaces have been added relatively recently in PBS in version 2.2, I believe, and can kind of be thought of as a folder to put VMs, containers and other backups inside of. So that way in your data store, you have your main data store here, you can put stuff in your root namespace where you just have these VMs, for example, but it could also create a different namespace here. And maybe I could call it something like important or something like that. And then that way I could double click on it. And now I have a new namespace with its own backups that I can put inside of it. And because it's part of a different namespace, I can now have different rules applied to the data in that namespace differently. So one example of this is pruning. So if I run a new prune job right now, I can say keep last as five, and then I can go under namespace and select which namespace I want to do. This can make it pretty easy, for example, to have some important VMs that I want to keep for significantly longer than my normal VMs that I can have less copies of. So I can have different prune jobs. I can also have sync jobs. I can, those can be specific to namespaces and a few other different things. Also including permissions. Now when we were looking at permissions before, I didn't have any namespaces created so they didn't show. But I also want to note namespaces don't show in the permissions GUI, but you can still add them for this API key. And I want to give it access to just the important namespace. If I was to go add an API token permission and go under path slash data store, I don't see anything here for it. But what I can do is I can go slash important and just type it in and put the API token I want in and it will work correctly even though the GUI doesn't show it and prompt it to you. That is one thing I wish they would change in it. And I've done some testing. It works fine. They just don't have the GUI options to show it to you. But if I go back to my data store, now I can take a look at my namespace and run many things specific to my namespace. But one thing you might be thinking is I could just add a data store. I can create another folder on that same mount port on my hard drive or whatever data store I'm using, create another data store and still do all of these things differently and add it. So why would I want a namespace? And the big reason is the deduplication data is all the same, even if it's within a namespace or on the main root namespace. And what that means is it essentially uses less disk space. So if you have maybe a Windows VM you want to keep longer in a specific namespace, and then you have a normal Windows VMs you don't need to keep as long, all of those Windows systems are likely going to share a reasonable amount of data. And since it's all a single deduplication, it'll use less data on disk. If the important Windows VMs you wanted to keep longer wasn't a separate data store itself, it would actually have to store a lot of data differently and use more data. This does mean that some things like garbage collection are going to still take longer because it has to look through everything when it does garbage collection. There's no way to do garbage collection at a namespace only level because it has to look through all the namespaces when doing garbage collection. Another question you might have is if I'm using Proxmox Backup Server and backing up the PBS, how do I tell it which namespace to go to? And the way you do is you add another mount to it, and when you're sending this all up, you set the namespace to whatever you want. So just make sure that this namespace matches whatever your namespace is in the Proxmox Backup Server web interfaces. And once you've done that, you can click Add right now. It's going to be added as its own storage and appear separately. So that way when I go to backup a VM right now, I can say backup now, put it in this important one, click backup. And what it should start doing is it should start storing that data in the important one right now. And I can see that's the VM that's being backed up right now in its known namespace. And since it's creating it on its own, it should be using significantly less disk space because it's sharing all of the same deduplication table as all the data in this root namespace, which are very similar to VMs I already have. Now let's take a look at using PBS to backup whatever files you could want on a Linux system. While PBS is designed to mostly backup Proxmox virtualization containers and VMs, Proxmox Backup Client can be installed and used to backup any file on a Debian-based system. Let's take a look at making that work now. First thing you'll want to do is install Proxmox Backup Client. And that's just a little utility that works with the main Proxmox Backup Server, runs in the command line and allows you to make backups, restore backups, prune backups, and do other tasks. It's pretty much feature complete with the GUI, but is in the command line if you either like using the command line or if you want to script it or something like that. If you have a Proxmox virtualization system, it comes pre-installed with newer versions. But if you're on a normal Debian system that doesn't have it already, you can use the Proxmox Backup Client only repository here. You can copy and paste this into your /etc/apt/sources.list lk file, put it in there, apt update, apt install proxmox-backup-client. And once that runs, you should have Proxmox Backup Client on your system. If you want to test if your system has Proxmox Backup Client, you can just run Proxmox Backup Client. And if it runs successfully showing you all the different commands you can use, you have it on your system. Now let's take a look at getting some backup jobs done now. So if I run Proxmox Backup Client backup, it'll show me a few more options and all the things here. Another great resource is figuring out how you can use it to make backups and make that all work is the backup client usage page here. You can Google Proxmox Backup Client and it should bring you to this page and it gives you a lot of different information and a lot of great examples. And I'm going to be building upon that today with some of my testing knowledge too. So let's first take a look at running a very simple backup. So to get the feel for it, you're going to do Proxmox Backup Client. Backup is what I want to do. And then the next thing we're going to have to tell the system is what files we're going to be backing up and where we want to go in PBS. So when it comes to where we want it to go in PBS, it's going to be in some sort of archive file. If it's a single file, you're going to use a .img file. So this is something like a large backup image or something. So maybe something like a file image or snapshot of a file or something like that. So in this example here, you can see mydata.img and it's an LVM path. If it's multiple files, you want to use .pxar. So in my example here, I'm going to be using a directory file. So I'm going to use a pxar database. So I'm just going to call it something like backupdata.pxar. The naming here is going to be used when you refer to the backup files and restore it. And it doesn't matter and isn't related to the actual files you're going to be backing up. So then I'm going to do a colon and then this is going to be the path that I want to back up on this system. So in this case, it's going to be this path here and then it's going to be here. So now it's going to say, hey, take this data in this directory right here, back it up to this pxar database and that's going to be put in my PBS repository. Now how does it know what PBS repository to put it in? So I'm going to type in repository here and then I'm going to type in my repository string. Now there's a lot of different options here, depending on how your permissions and all that sort of stuff is set up. And this little table is a great resource for figuring out what you need to do it. And it depends on your user that you're using, your host name and port that you're using and your data store. By default, it assumes you're using the root at PAM user, local host with port 8007. So the only information you have to actually give it is the data store. But if for example, it's on another system, you have to give it the actual host name that you want to connect it to and the data store. If you want to use a different user than root at PAM, you have to give it the username, at PBS or whatever it's at, and then the host name data store. If you want a different port, you have to put colon port and then the data store. You can also do IPv6 in square brackets if you'd like. And if you have a token, you do PBS bang token name, and then at the host colon data store. So in this example right now, I'm going to do something like backup user that I created earlier at PBS at 192.168.1.80 is the system I'm going to be using for PBS storage, I think colon the name of the data store that I'm going to be backing it up to. So if I just run this right now, it's going to start doing backups. I've ran this command before, but if you haven't ran it before, it's going to prompt you for the password for that account and you can type it in interactively. This can be great for interactive use where you're actually typing in backups and doing it like this as just a single user and running backups here. But for a lot of use cases, you might want to script it or do something like that. And in that case, that's where these environmental variables come in handy. So some of these environmental variables are the repository you want to use. So instead of doing dash dash repository, you could take that exact same information here and set it as environmental variable. Some other options include PBS password for either the password you want to use or an API token secret. If you're using a password, you can also set it to an app password file in scripts. It can also be really nice to set it to a password file. That way that could be updated by something else if you want instead of the script. And also someone could modify the script or work with the script without being able to actually see what the password is. So only the actual thing running it can see that password file. You can set up encryption, which I'll go over in a little bit. And you can also set up the fingerprint. And that's that same fingerprint that when you see on the dashboard and copy it to Proxmox VE fingerprint. And that's really nice. So that way no one can man in the middle attack you and see what all your backups are going to be. So how would you set these environment variables in a script? One great example is some of these scripts that I've created earlier for some of my backups. So in this example right now, I have a test password right here that I just set my name to as an example. And then I'm going to use export the name of the thing environmental variable and then equals Brandon. You'd likely want to run quotes here if there's any spaces or weird characters because that way to ignore it. And then I can run it right now. And since it this environmental variable has been set, it knows that's the password for this account that it's going to be using later on. Another example I have is going to be using it with an API key. So if I take a look at this file right here, I can see I have an API key right here. I have my encryption password in quotes to make sure that if I put any weird characters or spaces or something that's going to be handled fine. And then I have this API key that I copied from it and I get all the advantages of using an API key. And since I'm using an API key, I have app PBS bang that too, which is the name of the API key that I'm going to be using in this use case. Now switching over to interactive mode, there's a couple more things I could want to do with examples here. So one of those things would be namespaces. So I could set in as to that important namespace I set earlier. So that way when I run another backup, I'm not just going to have a backup on the top level. If I refresh it right now, I can see this host slash PVE test has been created here. But if I switch to important, I can also see a backup has been made here. Now one thing I want to talk about with namespaces that can be really important with Proxmox backup client is backing up multiple locations. If I wanted to back up multiple locations, there's a couple of ways I could do that. First of all, I could just go add another path here. So maybe I could do it something like home data dot pxar, and then it could do colon slash root as my home directory on the system. And that way it's going to actually be backing up two things, this main backup store that I did earlier, and then it's also going to be backing up my home data into that home data one. Now, one thing I do want to note is all your backups on your host end up in the same place right here. And there's not really a way to change names. So if you want to backup files at different durations or different times, and you don't want to build them into one job, like I did here, where it's one big backup job, but with two tasks in it, I recommend putting them in different namespaces. Because otherwise, if you're doing multiple backup jobs in the same namespace, it kind of gets confused and you get different types of files and prune jobs don't work well. So I would recommend either putting all your backup files and stuff together, or putting each of them in different namespaces. If you want to backup one drive super often, for example, and one drive less often. And that way you can set your prune jobs for those separately. And now that that backup job is finished, if I take a look at my host slash PVE test right now, I can see that it has two of these PXAR files, one of which is going to be that homeDat and one is going to be the backup one. So one question you might have is, if I run this backup job again, how does it manage doing it multiple times? And when it comes to incremental and saving bandwidth and stuff like that. And what it does is if you run it a second time, it'll first download an information about what data already exists on the data store. And then it compares the data it's backing up to the data on the data store to save network bandwidth. Since essentially nothing has changed between the first backup I ran and the second backup I ran, it should save significant time because it's not actually sending any data to PBS. Although PBS always reads all the data from disk. So this means in this example, it still has to read all 12.2 gigabytes from disk every time. It just sends none of it over. The other thing you can do is if you only want to backup some files is use exclusion files. So if I want to exclude files, for example, I can use a dot PXAR exclude file. And since it's such a dot, that means that it's hidden. But what it'll do is it's going to be listed with all the files that I want it to skip. So in order to show off how exclusions of files work, I created a few extra empty little files right here. So normally when I run a backup, it's going to back up all of this data right here, but maybe I don't want it to do that. So I've actually created a hidden file. So if I run LS dash a it shows it. And if I show what's in that hidden file right now, I can see it has test one and test two. So it's going to look for this hidden file here, see that it skipped these two and won't back those up. So let's just do a quick demonstration to show that that's actually the case and it won't actually back up those files. So now I just finished my backup job. I can see it backed up a little bit of data right here and let's verify that it actually excluded those files I wanted it to. So here's the backup data that complain, contains all those files. And I'm going to click browse and I can see it contains test three, but it doesn't contain test one or test two because that PXR exclude file that it also backs up tells it not to include those files. And that is kind of cool. It actually backs up the pxarexclude file because that way you know what it skipped. And there are a few more fancy things you can do as pxarexclude files and they go over it here now if you want to take a look at that. The other thing is by default, it won't include mount points. One easy way to check for mount points is to run something like df dash h to see where are your drives and things are mounted. And what this means in for example, is if I was to backup slash bodge, it won't back up any of these PBS VM files, these things under a mount point, because it's not just a standard folder, another drive is mounted there and it won't go follow those mount points. And it also won't back up some of these directories you typically don't want to back up on a Linux system like slash dev slash sys that are generated on the fly by the system. You can also do an include dev option and then save the exact path you want it to include if you want to explicitly to include something that's mounted on the system, but you have to tell it to include mount points and not to exclude it because it does exclude it by default. Now that you've created your backups, you might need to restore your backups to get your data back. Let's go over that process now. PBS has three different ways of doing that. One is where you specify a path and it will restore it to the path. So then you can just copy all the data from the backup to the path. The next way to do it is an interactive one. This opens a shell, you can select some files you want, and then you can restore those files. And the third way is to mount the backup via a fuse file system. And then you can CD into the backup, copy out whatever files you want and set it up how you'd like. There's three different methods. So you kind of pick whatever you want, depending on your exact use case. And I'll go over a couple examples now. But before you can do any backups, you actually have to find out what exact backup and snapshot you want to restore. So what I'm going to do is I'm going to do proxmox backup client list. And this is actually doing very similar things to what you can do in the web GUI. And it's just going to list your number of backups. So I'm going to list host slash PVE slash this number here. Here's the files in it and the number of backups. And it's just going to show me the last snapshot here. If for example, I wanted to use a namespace, I can use that same dash dash NS important. And it'll show me all the ones inside important. Now let's say maybe I want to restore a previous snapshot and not just the newest one. I can use the snapshot list and then the group name right here to list all of the snapshots. So in this case, it's only going to list one. But if I do NS important, it has multiple of them. I can see I have multiple different backups. I can see the exact time that all these backups occur to. And I can see the size of these backups. Now let's go restore some of these files. This is going to be the first method that I talked about. We'll just restore it to a file path. So I'm going to use proxmox backup client restore repository is going to be the path here, but I can also set it via that environmental variable I talked about earlier. It's going to be the full path to the backup here. So if I take a look, that's going to be this latest snapshot here or under snapshot, this guy right here, any of these work. It depends on what data I want to back up from. And then I'm going to give it the file that I want to back up. So that's typically going to be a PXR file or maybe an IMG file you want to restore. And then that's going to give it a path that I want to restore on this system that it's running on. So in this example, it's going to be slash bode slash restore. And now that my restore job is finished, I can see that I have this fire right here. It's ready to use and has exactly the same contents as when I backed it up. Another thing I want to note is PBS maintains permissions and also access time and other metadata on the file. Now let's take a look at the next way of accessing files and approxmox backup and restoring them. And that's going to be with the catalog shell option. And this opens a shell that's kind of like the Linux shell, but a little bit different that you can view all the files and interact with them. But I'm going to do something slightly different first. And I'm going to do a catalog dump, taking a look at all the options. I can see them here, catalog dump and catalog shell. So what I'm going to do right now is I'm going to do a dump. And what this would do is it'll just list everything within a catalog. So under this example, I can say I have D for directory F for file. And here's all of my files that I backed up when I was doing it. But maybe I want to take a look through those files and select specific ones. So what I'm going to do is I'm going to change it from catalog dump to catalog shell. So now I've entered this interactive shell. And if I just run it a few times with no input, it'll show me all the options and commands that I can use here. And it's a pretty limited set of commands and essentially is built around selecting files and then you can restore selected. The Proxmox Wiki page also has some really nice options for finding specific files, for example. So let's try doing that now. So I'm going to do find star MOV, which are some of the movie files I might have in this directory. And then I'm going to do dash s select to select those files. So it looks like I have four MOV files and then I can do restore selected. And if I run that command here, I've restored all of my files with a dot MOV at the end of the file. And if I take a look at where I restored it, it maintains the file structure, but just restores those MOV files, which could save substantial amounts of time. While this does work, it does have a few compromises where some commands don't work right. And the other problem is, as you can see here, I've made a lot of mistakes because it's not a standard Linux command line that I or you might be familiar with. Now, if I want to be able to use the full Linux shell and all of my normal utilities, one thing I can do is I can just mount it here as a file system. What I'm going to do is the proxmox backup client mount my repository that I set earlier and then an existing directory I created on this system. So it says fuse library version that unfortunately it doesn't show up as in DF dash H as a mounted one. But if I do run mount, it'll show up the very bottom here. So it's a fuse file system on /mnt/restore. That's typed fuse. So now if I go to /mnt/restore and run an LS, I can see those are my backups I've made here. And the great thing is because it's a full Linux command prompt, I can use like find dash name star M O V and here's all my files with M O V in the name. And I could use the full fine command to do whatever I want to restore. I could find newer files for some reason or files within a specific date period and just restore those files instead of everything. For example, I could also rsync these files to something else. But if I was just going to our sync all of the files, I might as well just use PBS to restore all the data. Because one thing to note is PBS has a lot of overhead due to how it's like reading directly from a backup instead of a faster storage medium. So it's likely not going to be the fastest for a lot of file system commands, especially if this is a remote system. So I'd recommend just doing a full backup to a path if you can. Otherwise fuse file system is probably what I'd pick next because I'm used to using the Linux command prompt and that lets me use all the commands I want, unlike their custom shell, which is a little bit limited and things operate a bit differently than the standard commands would. Now let's take a look at doing encrypted backups of VMs and containers on Proxmox virtual environment. So taking a look at it now, I've just punched in all the basic information and in order to set up encryption, I'm going to go set up this encryption tab right here and it gives me a few options. By default, it doesn't encrypt backups. It can auto generate a key or it can upload an existing key. Since I don't already have a key, I could create one in the Proxmox backup client if I'd like though, I'm going to click auto generate a key. And when I click add right now, I can go print the key and it shows a physical copy of that key that I could print on my system right now. And this key is just going to be the information I need to do it. So I'm going to ignore doing that right now and I'm just going to download the key. They give me many different ways of saving this key in an important way because it's not going to come back. Now that I've clicked close, all the backups stored on here is going to be encrypted. So if I take a look at local encrypted, I can see that encryption is set up. I could edit the existing key. When I go to backup VMs and containers, it's now going to say it's encrypted in PBS and I won't be able to view it in the web interface there. The nice thing though is all the encryption is done in PVE on this server. So you don't have to trust the Proxmox backup server that the data is going to because there's no way that the Proxmox backup server can actually see the data in your VMs and containers unless they have that key. Same with offsite syncs. This is going to be a great way to sync to a server you don't trust because they can't access that data you have either. They can just verify it's correct and update it with new snapshots over time. So now let's take a look at doing encryption in Proxmox backup client. bit, but let's set up a key right here now. So what I'm going to do is I'm going to go create a key right now. By default, it's going to prompt me for a password to encrypt the key with. So that way I need the key file and the password to do it. I can also create it with a TDF none don't require a password. So that way only if I have a key, I can restore these backups. So now let's run this backup requiring the key. So now it's going to prompt him for the encryption key password. And if I type something wrong, it's going to say the pass phrase is too short. Or if I type something longer, that's wrong. It's going to say unable to decrypt open SSL error. But if I type the correct password, it's going to start to back it up. And now my encrypted backup is finished. One thing I might notice is that the duration is super long and that's because I'm doing an encrypted backup and data is going to be stored completely differently. So it can't reuse the existing chunks. And if I take a look at the data here, it says mixed and I started encrypting it and now it shows as encrypted. So since I made an encrypted backup, let's try melting that one right now. If I map it to a different one, it's going to say what it's missing the key that's encrypted. It's going to give me a little bit of information about the key so I can verify it's correct, but it's not going to let me do it. So now I'm going to push this dash dash key parameter in and with the key file, I can now point it to slash root slash wherever I saved my key file. And now that I've put my key file on the system, I can put the password in and now that the password is correct, it's been mounted. And if I run slash mount, it's going to show that it's been mounted correctly and I can access all the files from it. I will just want you to be very careful with the encryption key on the system because if you lost the key, you lost the backups. Now let's take a look at the sync feature in PBS. This feature can be used to sync between multiple different PBS servers and that can be used to easily have an offsite copy of your data. Just in case something were to happen to your main onsite location, you still have an offsite copy of data that you can restore from. And I'm going to be going over the process of syncing to that offsite copy and then restoring backups back from that copy. The other nice thing is PBS handles is pretty well minimizing the amount of bandwidth it has to send over the WAN connection by only copying the changed chunks, which is generally good for relatively slow bandwidth and potentially high cost WAN connections you might be using. Now let's talk a little bit about how this connection works. PBS uses a pull mentality when it comes to doing offsite syncs. So that means a lot of the sync settings and all of that sort of thing is going to be done from the destination server. As you can think of the destination server as pulling the backups from your onsite copy and it will manage everything. And this can often be advantageous because that means the offsite server has access to the onsite server, but the onsite server can't access or do anything to the offsite server. Another thing to think of when setting up offsite sync backups is how you make that internet connection work, because often times you're going to have to go over a VPN or port forwarding or something like that. I'd suggest setting up a VPN if possible, that way Proxmox is encased in the VPN and never exposed to a WAN connection and it's often the most secure and can also be nice if you already have a VPN set up. But if you don't have a VPN set up, PBS uses port 8007 over TCP by default and you'll need to do a port forward of the onsite server if you're backing up by an onsite server to do the remote syncs. So for example, if I had my house A for example, with my main data and virtualization stuff and location B to have my offsite server, location A would have to do a port forward on 8007 to the PBS server that I wanted to have it sync from. So it's going to pull from A into B and A needs to be port forwarded. When it comes to restoring backups across the WAN connection, you need to port forward the remote system. And if you port forward that same port 8007 over TCP, then the Proxmox virtualization environment server can then access those remote backups over a WAN connection. With that out of the way, let's start setting up our sync job. For my test environment that I'm going to be showing off today, I'm going to be using that main server I was doing all my other test jobs with as my source system. And then I have a remote system running in a VM that I'm going to be using as my destination system. It's actually over a WAN connection. So that way I can see how it deals with the lower bandwidth of WAN connections and also the higher latency just to see how well that all works for all of my testing. Taking a look at my remote server first, since I'm doing a pull, all of this is going to be set up on the remote system. I'm first going to go under the remote tab here, click add, and then put the information of the local system that I want to pull the backups from. So I'm just going to give it a name like maybe local backup. I'm going to give it the IP address of the system. Since it is over a VPN, I can do whatever the local IP is. I then can give it the backup user at PBS name. I then I'm going to go copy and paste the fingerprint from my main server that I'm pulling the backups from, pasting it into here. And now it's set up as a remote server I can pull the backups from. Now under data store right here, I'm going to go under sync jobs and then add. It's going to fill in some of these things by default for me and under my source remote, I'm going to say local backup is its name. I'm going to say the data store that I want to backup. Its depth is how many backups down it should do. And I get an option if I want to remove vanished. And that means if the source system deletes a backup snapshot, it'll delete it on the destination. That means you can set up a prune job on the source system and it essentially copy those prune jobs on the destination system. If you want to do something different, like maybe on your remote system, have it hold it longer backups, but less frequent. Don't check this and set up your own prune job. You can also set things like rate limits, put it into a different name space on this system and it's going to run every hour and do a new sync job every hour. But just to demonstrate, I'm going to run it now and it looks like it says it's found three groups of sync. It's starting to sync that host slash PVE test. It's going to start sinking the archives between the systems and it should start to copy data. And if I go under content and reload it, it'll likely start showing these items here. My sync job is now finished. It took a little bit of time because there was a good amount of data on it, but it ran it about the speeds I'd expect from my WAN connection I was using. And one thing you might know is all the VMs and everything from my other backup server has been copied successfully. So it has all the same data on this server as it does on that one. And that's exactly what the sync job is designed to do. Keep two servers in sync and run it automatically to make sure they always stay in sync. But now that you have these offsite backups, how do you use them? I'm guessing you're only going to actually restore from offsite backups if your onsite backups sort of have an issue because otherwise restoring from onsite backups would be faster and easier in most cases. So I'm going to assume your onsite backup server has failed. So what I'd do is I'd set it up so now that your main system that mounts to Proxmox backups can now mount the offsite system. If you're using a VPN, it can likely do this already. But if you're using a WAN connection, you might need to port forward the offsite system and mount it. And I'd suggest doing this when you set up the server and potentially occasionally so that way you're not trying to figure out your networking stuff when your main server has failed and you're trying to get backups. So in this case, I'm running Proxmox Backup Client telling it to list it. And instead of having my previous repository at 192.168.1.80, I'm running it on this offsite backup over my WAN connection. And now I can list all the backups and I could do all the same things to access the backups and restore them and do the same things. Now let's talk about doing that in Proxmox Virtualization. So if I go into my PvE server, I can go add it as a PBS storage like any other system. So if I go to Proxmox Backup Server, I can say offsite, I can set the server to the IP address that I've set up earlier. And then once I've set all my data to the offsite system, I can click add and it's going to add the system. I could also just add this as a local storage and do backups to it directly, but I think it makes a lot more sense for many circumstances to just do a sync job. But now that I can see that I have my offsite one here, if I look at backups, I can see all the backups that this system has. So I let's say I have like VM 151. And if I want to do a restore this backup to a new backup number, I can hit restore right now and it's going to start restoring over WAN. It's going to be a bit slower because your WAN connection is slower. And if you are in a situation where you may have many a terabytes of data to restore and a relatively slow WAN connection, if you can get your hands physically on that offsite system, it might make a lot of sense to just go pick up that system from wherever it is being stored offsite. I could also sync from the offsite system back onto the onsite system. Perhaps I wanted previous backup copies after my onsite PBS server failed, but if I'm doing a restore of VMs or files, I think it makes a lot of sense to just skip that middleman and restore directly. And it's likely going to be a faster way to get your VMs up and running again. Taking a look at the system right now, it's been a bit over a minute and it's only 3% done with this relatively small VM. So it's not the fastest thing out there, but it is going over a WAN connection. And with my connection, I'm getting about 60 megabytes per second, which seeing the connection and other tests I've run on that connection, that speed makes sense to me. Now I want to take a little bit of a technical deep dive into how PBS works under the hood. I find it interesting to know how software is doing things under the hood. And sometimes it's nice to know what's going on under the hood to help make better hardware choices and make better software choices to use the features it has better. PBS is what I consider to be a chunk based backup solution. And the way that it works is it takes your VMs, containers, and other files you're backing up, splits them into small files, maybe just a handful of megabytes, compresses those files and then checks them. And then whenever it makes a new backup, it takes that newly checksum chunk and compares and sees if one of those already exists on disk. If it does already exist on disk, it's going to mark it as already there and drop it. So it's just no longer needed as it's a duplicate chunk of data. And the reason they do it this way is it helps maximize disk space. A lot of other backup software uses full and incremental backups. These can use significantly more space on disk, especially if you want to keep older copies where you may be keeping multiple full backups. These full backups are going to be relatively similar to previous full backups and use quite a bit of additional data. The advantage of doing full and incremental backups like that is they're much less load on the file system and much less overhead to process. For example, if I wanted to delete a few older backups with a full such incremental backup chain, I just have to go delete maybe a dozen files on that system. But in PBS, if I want to delete an older backup, I have to do the garbage collection. The way that garbage collection works is it reads through all the backups on disk and then uses the touch command to touch the file, which updates the atime parameter, which tells it when the file is accessed. Because of this, when using PBS, you don't want to disable atime in your file system. This is often recommended in ZFS or other file systems to help performance, but PBS actually needs this parameter to keep track of which chunks are being used on disk and which ones aren't. And since PBS is using a large amount of files, often into the tens of thousands, hundreds of thousands, even millions of files, if you have a large data store, this can take a good amount of time to read all the backups and do this. One thing to help speed up these garbage collection tasks is to store it all in SSDs. But since SSDs are pretty pricey, it's often recommended to use a ZFS special drive. The ZFS special drives will store as metadata. If you want to make a simple pool of ZFS special drives, you can use a command like this. So it's going to do Zpool create a RAID Z or essentially a RAID 5 type storage solution of a few drives, and then a special mirrored set of two SSDs to significantly speed up these operations. Here's a quick benchmark at ran of some garbage collection operations to see how big of a speed increase you can get. And using a special drive in ZFS will essentially give you the same speed up as using all SSDs for the data, because the data isn't being touched anyways. One thing to note with a special drive in ZFS is because it stores all the metadata, it is needed for the thought pool to work. So if you lose your special drives, you're going to lose all the data. And that's why it's recommended to use mirrored special drives, because then your special drive won't be a single point of failure for your ZFS pool. Now let's take a look at some of the performance benchmarks I ran. I ran these on different systems with different drives to try to compare when additional speed might be worth it and what it's most worth the upgrade. I primarily looked at CPU speed and disk speed. And all of my tests I was using a 10 gigabit network and it pretty much never seemed to be a limitation in my testing. I benchmarked four different tasks, creating backups, pruning the backups, doing garbage collections and doing a restore from backups. When it comes to creating backups, I made a four drive RAID Z of either SSDs, hard drives or hard drives with SSDs as special devices. And in my case, it didn't matter when it came to creating backups, likely because it was limited by the system making the backup anyways. In all of these cases, I was doing my initial backups at about 500 megabytes per second. But the one thing to note is these backups shot up to about a thousand megabytes per second when I did it on a faster system. My slower CPU system was a dual Xeon Sandy Bridge system, which is pretty old now. And my faster system was a Core i9 system. In almost all of my testing, I noticed very low CPU cores used with PBS. Maybe one core, maybe a little bit more. It looks like most of PBS's operations and the backup operations in Proxmox VE that are backing up to PBS to be very lightly threaded. So likely you want to aim for clock speed instead of core count for systems using PBS or doing backups in PVE. When it comes to the prune job, anything with SSDs was quite a bit faster. But typically pruning is a relatively fast job, so I wouldn't worry about optimizing that too much. On the other hand, garbage collection can take quite a while. When I ran a garbage collection on mechanical hard drives, it took quite a bit longer than anything else. Adding a special drive significantly sped this up. And one thing I noticed with large backups on just hard drive configs, these garbage collection tasks can take quite long and can almost get in the way of other things, so I'd highly recommend having a special drive if you have more than just a few backups of VMs. Just running on mechanical drives is likely going to be fine if you have a light home server use, but if you have multiple servers or getting multiple terabytes of data, you really want that SSD as a special drive. Another thing to note is my garbage collection speeds more than doubled going from my old and sandy bridge xeons to my core i9, so the CPU is definitely doing quite a bit in those tasks too, so having a newer higher clock chip will help speed up garbage collections as well. When it came to restoring backups, having a faster CPU helped a reasonable amount, especially if large chunks of empty data were likely to CPU's faster compression speed helped a lot. Adding a special drive helps a little bit when it comes to doing restores, but going all SSD does help speed up the backup restores quite a bit. Looking at this data and my other experiences of PBS, here are some recommendations to keep your backups running quickly. First of all, use VMs that are running and try to keep it so you're backing up only running VMs frequently. With running VMs, it creates a dirty bitmap, so that way when you backup a VM maybe during the morning and the night, it'll only backup the blocks that have changed on the VM between morning and night, and won't backup the whole VM. Whereas if you're using Proxmox Backup Client or backing up a container, it has to read all of the data, which results in significantly more disk IO being read and more system usage. This really speeds up a lot of VMs, and I'd recommend trying to only do frequent backups of running VMs. Taking a look at the benchmark numbers when it comes to hardware selection, get a faster clock CPU if you can, especially if backup speeds are a major issue. Higher clock speeds seem to make a big difference here, and PBS doesn't seem to be very well threaded. The next thing is, if your garbage collection tasks are taking a while, take a look at adding a special device or other just going full SSD, as that will speed it up significantly. If you can go full SSD feasibly, it will speed up a lot of operations and make things a lot quicker if you can get away with it. Another thing to note is you can only backup a single VM or container on one host at the same time in PvE, because otherwise it'll be locked. So one way to get around this is to add multiple hosts. So if you're dealing with a large amount of VMs that you're backing up and a large Proxmox cluster, maybe add a few more smaller nodes instead of having fewer bigger nodes, as it'll make backups run faster because you can paralyze it better. A lot of these tips result in just get faster hardware or get more hardware, but there's only so much that can be done with optimization, and a lot of it has to be done with just faster hardware unfortunately. And whoa, that was a lot of information about PBS. I think I covered nearly everything about PBS other than tape backups. Let me know if you think I missed anything in PBS, and if there's any other tests you'd like me to run to get some better numbers or any extra information about. Hopefully you found this video useful, and it helps you backup your VMs better. Thanks for watching.

Info

Channel: ElectronicsWizardry

Views: 13,474

Rating: undefined out of 5

Keywords:

Id: ddZ-f_nuQ8k

Channel Id: undefined

Length: 43min 38sec (2618 seconds)

Published: Sat Sep 09 2023