Homelab Project - Proxmox High Availability Cluster with Ceph

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right welcome to the video today we are going to do a fun little home lab project which is setting up and configuring a proxmox high availability cluster and then we're going to additionally add SEF distributed object storage on top of it and we're going to show failover between virtual machines some uh things where it went wrong when I was testing it where things went well Etc so what we're going to start with is I have a basic proxmox instance running here and what we need to do is we're going to create three virtual machines within that proxmox instance that each run proxmox themselves and then we will we will have each of those virtual machines configured as a high availability cluster and use that to show failover so what we're going to do here is set up each of the proxmox virtual machines and once those are set up and running we will jump in and get the demonstration started all right now that we have our proxmox noes set up we can go into each of them and start to configure a cluster so starting here on the screen this is my root proxmox host where you can see the three other proxmox instances we created you can see on their console that they're just letting you know the the web admin site is up and running so we have these three here and up in these tabs I have loaded up the admin interfaces for each of these three proxmox instances so what we're going to do is start on the first one proxmox cluster one we're not going to install SEF now we will get to that in a bit we're going to start simple now and we're not going to use the distributed file store we will just go into the data center configuration we're going to go to the cluster Tab and we're going to hit create cluster and we'll just call this we'll just call it cluster one again we're going to keep this simple today just for playing around uh in the hlam environment so we're going to create the cluster all right that is all done so you can see we have one node in the cluster so we're going to need to do is go to each of the other nodes and and join those so here you can see there is the join information so you can open that up I'm going to click copy we'll go to the second machine and then here we will click join cluster again from the data center interface paste in the join information enter the root the pier password and click join cluster one and it will do that and we'll swap over to the third machine here while we're at it click on the data center tab click cluster and same same operation just join cluster again all right so both of these nodes have joined the cluster and we're going to go back to the first tab the main cluster host and we should see them all here and here we go cluster nodes 1 2 and three and what you need to do is you need to refresh the browser on these just to show that it's working I'm actually not even going to log in because we don't need to anymore so over on this first instance we can manage all three instances in the cluster from one interface which is really nice so now that we have a cluster setup we're going to set up a very simple VM and we're going to show you how you can do migration between the cluster nodes now we're going to do this very simply we're going to create a VM on one of these nodes and we're going to have the VM disc stored on local storage like I said that's going to be very simple to set up but it's going to come with a number of downsides in terms of migration Etc but it it'll just be a simple starting point and we can show kind of how that work how migration works and you'll see that's it's probably not the best situation we'll dive into that a little more so what I'm going to do now is I'm going to go and not going to install St clearly this thing knows where we're going so we're not going to do that we're just going to go into the first cluster node we need to go into the local storage and we need to First upload an ISL so what we're going to do is we're going to use this and create a simple buun two server sitting on this cluster node and so we'll go off and do that real quick I'm sure you've seen that a thousand times all right now that we have our umuntu server up and running and by the way they took a really long time to install it probably took a good hour or two I don't know why maybe with our Inception setup here it's not the most efficient thing but if we go back to kind of our route proxmox instance this is a it's a Dell 370 XD server with 56 cores and 256 GB of RAM not the most new machine but it definitely has some beef anyway that took a really long time to install but we do have our server up and running which you can see here and it is just ticking away doing its thing I'm logged in at the console so what we're going to do is just show a really simple migration and remember here this server has its disc stored locally so if you go into local lvm uh you can see the dis now if you want to migrate this to another node in the cluster it's as simple as right click and click migrate before we go ahead and do that let's just kind of pause and explain a little bit about what's going on this is the simplest form of migration and when you watch other videos out there they'll get into all the data stores that you can add like or NFS Etc that actually make migration easier faster allow you to do live migration Etc and that's all great but don't discount the benefits of a really simple environment with simple uh migration using local storage so here the VM dis is stored locally if think about what has to happen to migrate the machine to another VM if you want to move this down to Cluster node number two there's no VM disc down here like there's nothing to get this VM started on and so basically the migration so migration process really just has to take the entire VM dis move it to another machine and restart it and that's fine like again if you think about it like don't underestimate the usefulness of that if you have a simple home lab environment maybe you just have a couple proxmox noes and you want to be able to to move VMS around in the event that you need to take some downtime you don't need to worry about running a distributed file system and the complexity that comes with that of course it might be fun to play with but just running your VMS on simple local storage and and moving them when needed which may not be very often this could work for you and it's super simple so don't uh don't underestimate that so what we'll do is just to show this we are going to right click and we will choose migrate and you can see where we pick where we want to uh send it to and for this we'll pick cluster two and it says migration with a local disc might take a long time of course because it's going to have to copy a 16 GB disc to the other host so what we're going to do is I'll just click migrate and we'll watch what happens so as you can see it's just migrating the storage over to the other VM and remember we are running in the full Inception environment so IO is definitely going to be slow as evidenced by my multi-hour installation time so we'll let this finish up and then we'll show you what happens once the VM is migrated over all right so the migration is complete and if you were paying close attention there there was actually an error in the first migration I had to kind of go restart it and tweak things a little bit well the problem was I did not have enough space on the destination cluster for the hard drive so I went back and quickly added some additional dri which you can see over here on the left side I added these VM drives just so there'll be a destination with enough space and then we started the migration again and you can see just looking at the log here we zoom in looking at the log here it transferred the 16 GB of disk storage and then after that it went and transferred the memory State the 64 GB of memory state over to the other machine and finish that up and then if you look back at if you look back at the console window behind us the Ping we had running which I started back when this was running on cluster 1 node is still going just fine so there is a simple demonstration of a VM migration using local storage you can see it wasn't exactly quick but it got the job done so I was getting ready to transition this video into a section on SE and show you how to set up a distributed storage server which which will make migration much better I stopped recording and I decided to try something just before I logged off for the night and so if we go back we did I showed you how to do a simple migration of a VM from one cluster note to another using local storage and we watched it copy everything over and everything were great and I was actually curious what would happen in the case of a a node failure what would happen in the case of a you know migration in in that case using local storage now if a node simply shuts down you wouldn't expect it to to migrate it since obviously it's an unexpected shutdown the data is stored locally I think we all know how that would go but what I was curious about is if I had a VM sitting on local storage and I chose to actually go into the node and I explicitly chose shutdown would it be smart enough to migrate the VM and then shut down the node and so what I did was the by the way to to jump to the answer the answer is no uh it did not go well at all and I ended up destroying a VM in the process I'll just show you what I did and so we'll we'll get into this more in in the Seth section but what you can do is if you go up to the data center level you can set up high availabil excuse me you can set up high availability Group which describe how your cluster will behave in the event of a failure and how VMS will move around in that case so what I'm going to do just again for a quick example I'm going to create one high availability I can't talk today high availability group and just for this example I'm only going to put nodes two and three in this because these were the nodes with the extra storage I added and you can set priorities and everything else we're not going to do that right now so I created the group and now I need to add a VM to this group so I am going to add server one I only have one that's VM ID 100 and I'm just going to add it to I think my old group is still in here let me just confirm that test group is right let's just see so I added that oh so it should be on test group there we go okay so I created oh I have I messed this up I'm going to remove this group so I have one high Avail High availability group with two nodes in it I didn't set any priorities I added my single VM to the group and so again what I wondered is given thisp group setup and we know it's what will happen is in theory if node 3 goes down here it should move the VMS to node 2 and I wondered if in a manage shutdown process if that would happen so I came in you can see I have my VM running here it's all good so I clicked on node three and I selected shutdown and we'll watch what happens and as this thing shuts down I'm just going to show you one thing if we go into the VM store storage you can see the dis associated with VM 100 so you can see down here in the console it started the shutdown process and so this might take a minute all right so we can see that cluster node 3 is shut down and if I go back out to my root proxmox instance you can also see node three is shut down so we should in theory see the VM move over to node two since this was a manage shutdown so we'll watch for that all right so we can see we just saw the VM server one pop up here on cluster 2 and if you look down here in the tasks you can see the VM 100 start command but you'll notice there's a bit of a problem so VM 100 start and it says error no such logical volume and so let's dig into why that is and what I'm I'm going to do just to show this I'm going to go back out to the root instance I'm going to restart node 3 because we're going to need to look at the storage on that and this thing is continually trying to restart so you can see it says no such logical volume dis zero so and then if I go on to Noe two I go into the VMS there's no VM dis here there's no VM dis anywhere so when I did the graceful shutdown of node 3 it migrated the VM definition but it didn't migrate the disc associated with it okay node three is back up and we'll go in here and there is the dis that belongs to the VM sitting over on Noe 2 so obviously that was a fail and as it stands that VM is broken until we can reassociate the dis with the VM definition but just another uh example a potential downside if you're going to go build your proxmox cluster and you're going to stick with local VMS you're not going to get a migration that works on shutdown you'll have to migrate the VMS manually and then shut down cluster nodes yourself it's a very simple configuration as I mentioned before and if that works for your use Case by all means go for it but what we'll do now is we'll jump into setting up SEF and distributed object storage which will definitely add complexity to the configuration but it'll make some of these use cases actually work but we'll dig into that next all right now we're going to take a look at using a distributed storage service with your proxmox cluster and we're going to use SEF since that is the uh fun thing to do it's probably one the best options out there and it's integrated nicely with proxmox so you can really go super deep into SE there's lots of information out there books tutorials you could even run a seph cluster on your own if you want to go play with it but just to kind of simplify things a bit and give you a flavor of what seph does you can think of SEF as a storage service that makes all of your VM discs available across all of the nodes of your cluster so if you have a VM running on one node disc for the VM is available across all the nodes and you can just see how like having this makes things like migration much easier and faster and step is actually it's a standalone service and it's kind of abstracting the storage and management of discs and so when you're running SEF with proxo each node will contribute local discs to the storage service and you'll see this when we set it up so you can set up nodes with a bunch of discs and then contribute them to the storage pool which is managed by SEF each local disc is accompanied by something called an OSD which is an object storage demon and this is responsible for managing the storage on the disc Reading Writing the data reporting status and it participates in the replication process because as unsurprisingly in Seth your data is replicated and safe and you can control all those replication parameters when you really get into it and on top of that seph uses monitoring processes to maintain the health of the cluster and maintain metadata about the overall cluster map and configuration again that's super super high level view of what's going on but when we start to set it up hopefully it's a enough context so to understand kind of the actual pieces we're configuring so what we're going to do is we're going to set up each of these through Pro proxmox we're going to set up the storage uh the OSD demons uh and then the monitors and so the first thing we need to do is configure several discs to contribute to the storage pool so what I'm going to do is I'm going to go back to my root proxmox node which is holding my Inception node which is holding each of the three proxmox instances and I'm going to add diss to each of those nodes which then we're going to incorporate with our SEF service for demonstration purposes and for fun just to show a little bit how this works I'm not going to add just one dis to each node I might add a couple and you'll get to see how all these discs come together to contribute to the the SEF storage pool so we'll go ahead and do this now so I have right here I have my kind of root proxmox instance here are the three proxmox cluster nodes we have running so I'm going to go to the first one go to the hardware and I am going to add a hard disk and I will just put this in local storage 32 gigabytes is fine we're just messing around so we added one that's fine I'm going to go to the second node I actually have another hard disk on here because I was oh I think I mapped this to I'm sorry I mapped this to the the VMS pool from the previous example but I'm going to add another one actually in this case I will add two and then I'm going to go down to the third node and why not will be creative I will add three and I'll make them a different size I'll just make them 16 so now that we have the hard drive set up we need to go back onto these cluster nodes and get SEF installed so we'll go to the first cluster node and click on SEF and yep we will install SEF and so for here since we do not have an Enterprise license you want to make sure you pick the no subscription repository and we're going to pick the latest uh Reef version of SEF and we'll go ahead and start the installation all right installation is done so now we'll move over to the configuration and here it's asking us to configure the public network for the seph cluster and remember I describ seph as it's a distributed storage service so it can run on its own network and you know you can configure the connections and the nodes and kind of build it out in a super specialized way for highly scaled environments but in this case we're just running on a simple very virtualized server so we will just use the same network and we'll leave everything else uh as a default with the number of replicas at three so we're all set so the next thing we need to do as it says here is install this on other nodes create the monitors osds and then finally the storage pool so we'll click finish now we'll go to the next node and go through the same process this is going to be a bit easier since we won't need to go through the configuration it will just inherit that we need to make sure to choose the same settings again installation on the third node is done configuration is already initialized so we are all done installing SEF on our three nodes and you can see here um this moved over to the SE SEF information page and you can see there is a warning that we do not have any osds so basically it's not matching any any discs which is not a surprise but what I'm going to do is I'm going to go back up to the first cluster and what we're going to do is first we're going to set up some monitors so we are going to create we have one monitor on our first node I'm going to create monitors on the second and third nodes and you'll see that show up in a second perfect so now we'll go create an OSD to manage the disk we have in place we are still on the first cluster node and if I go to diss you should be able to see the diss I added before four okay so there I I was wondering where all the discs were I added uh one disc on this node on the second node here that two discs I added and there was one being used yeah so um this one for lvm is being used we use that for the VM storage and then on the third node yep three smaller discs so we'll start on the first node and we will create an OSD so we'll go to OSD create OSD it figures out the disc to there's only one we're not going to encrypt it we'll just leave everything uh by default and there is our first OSD running on our first cluster node so we'll go over to the second cluster node and we'll do the same thing but this time we're going to do it twice because we have two discs that here and you can see in the drop down it sees the two unused discs that we added to this node so we will create an OSD for each of these all right so now if we reload the OSD screen yep you can see we have three osds in here and so now we'll get out to the third node and do the same thing this time three times okay we have all of our osds created 1 2 3 six drives this last one is just being set up and we'll see this pop in here momentarily it's all good and if we go to the data center tab again we can see the overall health of our SEF cluster you can see here we have three monitors one manager process we don't need more than that right now for this demonstration and again if this was actually doing any uh read or write we would see this here and we can see the overall usage of the cluster so the next thing we need to do is actually create a storage pool so we can put some VM discs in it so I'm just going to go back to Cluster one I'm going to click on pools and we're going to create a new pool and I'll just call it so we will just name this VM pool we'll leave everything else uh with default values again you could dig into the information to figure out how to tune these appropriately but just for playing around and see how works we will just leave this default perfect so we have the VM pool up and running and you can see it on each of these cluster nodes so just as we described it's a storage service it now it is uh visible to each of these cluster nodes and any data we put into the storage pool we be able to see in real time from each of these nodes so we want to do now I removed all the VMS from this so we're going to go through the process of creating another VM we are going to ensure the drive is stored on this VM pool and then we can go through and show how we can migrate the VM set up some high available High availability groups Etc but first we'll go through the process of creating a VM yet again so I'll go off and do that real quick so now that we have the new VM up and running you can see it here in front of us pinging away and over here in the side you can see how the storage is set up for this VM remember we created this VM pool which is our SE storage pool and you can see the disc for the VM remember the VM is running here on node one you can see the disc on node one but again if you go down to no two and node 3 you can see the dis is there as well so the dis is visible across those nodes the VM is running so before we go in and set up high availability groups and show that let's just show a simple migration and see how long that takes so I'll right click like I did before I'm going to choose migrate I will choose cluster 3 and we'll migrate it and you can see the Ping running in the background that should continue and you can see here in the window what it's transferring is the VM state so this is the memory state of the VM the dis state is already available across these nodes there you go the migration completed says a downtime of 7 9 milliseconds and so we'll close that over here you can see on node 3 the server is up and running and continuing to Ping away so that was a successful migration much much faster even in this highly virtualized nonoptimal environment like the differences are very apparent so now let's go configure some high availability groups and show some of the automated failover so if I go up to the data center tab click on high availability we're going to create a group and this is a group of nodes in the cluster that we're going to configure together as an availability group again if you had a very large cluster with different nodes and different configurations you can imagine setting up different groups to determine how you want VMS to fail across them Etc we will just make this simple we'll call this uh TG we're going to put all the nodes into it all right the test group is created now I'm going to go to the high avability tab now I have to add resources to that group so I'm going to add the one VM we have I will add it to the one group we have there we go so that's cued and now it's all set so we have this VM running pinging away so what I'm going to do now is test the case that I showed you before that failed completely where I shut down the node I did a graceful shutdown and it completely messed up the VM so I'm going to go over to your cluster three node I would choose shutdown and we'll watch the VM and see what happens by the way you can see in the console we should see the shutdown command in here there we go bulk shutdown right there so the console is gone I'm guessing this cluster node is offline so what we're going to do we're going to swap back to high availability uh tab on the data center and you can see here it shows that cluster 3 is in shutdown mode and I think it may switch to just stating it is dead but it understands it is in shutdown mode here you can see it turned red so we're going to watch for now is the migration of the VM here we go so this switched to Old timestamp it recognizes that it's out so let's see where it decides to put the VM so interestingly so we see the VM 106 shutdown Happening Here okay so it moved the VM up to the first cluster node and it looks like it did not do a live migration of that I actually wasn't expecting that I thought if you do a graceful shutdown it would transfer the VM memory State over to the other server but that apparently didn't happen now maybe there's a piece of configuration I missed or something else but when we just a few minutes ago when we did the migration you saw that was was that was a live migration from one machine to the other in this case that didn't happen so while this VM starts up which you can see here I am going to go back okay this thing's booting I'm going to go back out to the the outer proxmox node and I going to restart the node we shut down so you can see in the event of a cluster node shutting down either planned or unplanned there is definitely some downtime associated with the VM live migration works much better all right so our VM is back up and running let's try one other thing so let's go back into our group configuration and I'm curious so we have the VM running on cluster one I'm curious what happens if we simply remove cluster one from the group I don't know if it'll do anything anything oh interesting so there we go we changed the group definition and it immediately started a migration of server 3 to another place now I'm curious if it does this as a live migration well there you go it yep it did do it as a live migration it moved it over to the second note in the cluster which is one of the two we put in the group so that was pretty neat but I think this is actually a good time to wrap it up so we we've covered a lot of ground we've gone through them setting up three proxmox VMS from scratch running on another proxmox instance we created virtual machines running with local storage and showed how migration works there and how that can go wrong and actually lead the VMS in a broken State we installed and did a simple SEF configuration with a number of hard drives varying the number of drives per host just to see how that works a little bit set up our osds monitoring Etc and then we showed how a SEF cluster helps and supports kind of live migration and faster migration and high availability configuration this is really just a jumping off to point you can go much deeper into all these topics and really customize this for your particular needs but again for a h lab project perspective if you're looking for something to do this was a ton of fun I learned a lot going through this and it might cause me to go change my overall home lab setup because I think I have another like old machine running proxmox in an unclustered configuration so maybe that turn into another weekend project but I hope this was super helpful if you have any questions or comments please leave them below and we will see you next time
Info
Channel: Sonoran Tech
Views: 2,168
Rating: undefined out of 5
Keywords:
Id: eN4jDvmssYs
Channel Id: undefined
Length: 31min 7sec (1867 seconds)
Published: Thu Mar 14 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.