Split A GPU Between Multiple Computers - Proxmox LXC (Unprivileged)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody welcome back to Jim's Garage in my last video we focused on LC's and that was a basic overview of how to create them in proxmox using a Debian 12 image now as I said in that video These are not exclusive to proxmox they're part of Linux but I did mention that I wanted to come back and do Hardware pass through so that's exactly what we're going to do in this video and I'm not just going to demonstrate it using a privileged container we're actually going to be using unprivileged containers and I'll even show an example whereby I'm sharing a single GPU and splitting it between three lxs what does all that mean well before I mentioned in the previous video about lxc and privileged versus unprivileged and I guess analogy might be a little bit like running a Docker container in a privilege mode whereby if it was compromised a container it could potentially impact the host that same sort of thing applies to LC's whereby if we run a privilege container if that container is compromised it could have a significant impact on the host I.E it could have a full compromise so by using unprivileged containers typically that comes with some tradeoffs I.E you can't do Advanced things like pass through without some complex setups well I've scoured the web and I've been playing and I've come up with such a config now this is going to allow us to pass through a dedicated GPU or an igpu I've only tested this with Intel but it should work for any thing because the process will be the same and this has enabled me to share that GPU with all the security benefits of unprivileged with three lxs now because there are a few config steps to go through I don't know exactly how well this will last the instructions I'm using are about 2 years old and it still works so that bodess well and that was when proxmox was running on Debian 11 and it still works now in Debian 12 and the latest release of proxmox so let me show you how to get this set up I'm really excited by this and I will continue to go down this path because I would love to be in a position whereby I can run my kubernetes infrastructure on lxc that all share a single GPU at the moment the issue I have with VMS is that you can only pass through a single device to a VM on consumer devices I know you can do it with Enterprise but that's really expensive and not really very energy efficient either in some cases this approach would mean that I could have half my hosts on one physical machine with a single GPU pass through and half on the other with again a single GPU pass through and because in my case they're both Intel my worker nodes should be able to assign pods to those different ones and yeah it might not be elegant because remember you can only attach one pod to one GPU so for every pod I want to have a GPU I might have to spin up a new lxc a new node but it would give me that capability with very minimal overhead anyway enough waffling let's jump into the configuration so back over on my beloved test bench for all things proxmox and GPU pass through you can see it's an old one it's a 6700k this is a 6th gen Intel and if I look over here you can see that I've got three CTS three containers spun up like I showed you how to do in the previous video now if we pick one of those we can look in the options and we can see that it's an unprivileged container yes now if if we connect into that container so I'll do a console I'll do an IPA you can see that the IP address here is on 7.42 and now if I load this up in my browser I've already deployed jellyfin on here and I'll show you that I've got GPU acceleration for transcoding working that's going to show that a it's working unprivileged and B that all three of these CTS are also sharing the GPU I'll show you that later so I've got everybody's favorite testing video here and if I start this and I head down to the Cog and just make sure yeah I've set the quality to be not native so I'll change it to 480 just for this video and then if I click the I here you can see that it's transcoding at 310 FPS so great that's definitely using the GPU so how did I get here let's get back into proxmox and I'll show you how so the trick to this is through user permissions so an lxc as I mentioned in the last video shares the underlying hosts infrastructure I.E its setup its folders its devices Etc but it puts those in its own namespace especially when unprivileged now the issue that causes is that it creates segmentation between the two environments and that's good for security but it's bad for device sharing so what we need to do is actually create relationships between the hosts users and the lxc users in summary we need to make sure that the lxc has access to the specific devices that we need in this case the GPU so let's now dive into the console for proxmox and we'll get this up and running so if we dive into the shell for our proxmox host I'm going to run this first command it's going to be a cat and it's going to do the ETC groups so this is going to show me all the groups that are on my proxmox host and the ones we're interested in are video and render because those are the ones that we need to give the lxc access to which jelly fin within Docker will then use so running that command you can see that render is 104 and video is also 44 and currently those are assigned to root defaultly you'll see that they're not assigned to a user space we'll come on to that in a moment but that should give you a hint as what we're doing here now that we have those group IDs we can move on to the next step step the next step is to actually amend the sub group IDs so if I go into here you'll see that there's already a default mapping at the top however I've also added this mapping here so I've said that the root 44 and 104 remember those were the group IDs we just found are also mapped to one now we'll use that in the next command don't worry all of these commands will be available on my gith and you can find that in the description below now that we have that done we're on to the complex part so what we need to do is we need to change directory and we need to go to Etc then we need to go to PVE and then we need to go to our LX C's and so if I do an LS in here you'll see that I've got 101 102 and 103 and if you look on the left that's those CTS here now I'm going to edit one of these just just to show you what it looks like as this is already working so I'm going to open 103 and if I look in here there's a few things firstly ignore the bottom this green section that's just a snapshot it's a good idea to have those and I left it in just because I think it's good practice but it's the top part that we're caring about here it's this bit all down here now most of this was created using the template I.E the process I walked you through in the last video the bit that we're concerned about is everything underneath the unprivileged one what do these mean well where you see lxc docg group devices this section here these are the devices that are on your host so how do we get those well if I come out of this for a second and I run the following command this will list all the devices on the host in the devd and if I do that I get the following now this might be a little bit different to you because this is my setup using both an igpu I.E that 6700 K has an HD 530 on it and also as you know in my previous videos this has an Intel Arc A380 so if I did an lspci you'd probably see those devices here at the bottom on 03 you see the A380 and if I was to scroll up you can see here on the two I've got the HD 53 30 so let's clear that and let's go back to the previous command the important things here are these group IDs you can see 128 which is assigned to render 128 quite handily and 129 assigned to 129 now traditionally the igpu will be picked up first and the discrete GPU will be picked up second that's not strictly always the case so you will have to determine which one it is but the one two 28 in my instance is the igpu and the 129 is my dedicated GPU so let's now head back into that config with this knowledge heading back in here you can see that I've passed through the 129 so that's the discret GPU next I've mounted 129 as 129 so here a bit like a dock amount we've got left on the host and we've got right in the lxc so basically it's mounting this GPU as though it were a physical machine it's giving it the exact same Mount options and that's because that's what it expects you could probably change this but you would have to do more tweaking down the line after that we get onto the secret source that makes the unprivileged work and that is where we are doing ID map the clues in the title so we are mapping the IDS that we saw earlier so where we created that sub guey ID and we saw those IDs we're mapping those which remember on proxmox are owned by the root and we're saying that another user this lxc can use it but only those specific devices so is still an unprivileged but it basically has Hardware access to those devices that's great because we're kind of following the principle of least privilege we're giving it access to just the parts it needs without having to risk anything else on the host albeit there's always a slight risk with containerization so this first one has the U which is a user and we've seen that already in the subgroup IDS that was already set the next ones you'll see all have the G for group denotation so what do all these mean well the second one here this group 10044 this Maps the group IDs of 0 to 4 3 the lxc namespace to 100,000 to 10,043 on the host the next is where we get the group ID 44 in this case to be the same so that means that the group ID of the owner for the I think it was the render or it might have been the video it doesn't matter which one whichever that one's relating to is the same within the lxc the next sections get a little bit more complicated and this is because we're using slightly different group IDs in this instance we're using 107 and 108 and you'll also notice here that 104 remember that was the ID of my discrete GPU so that's being passed through here but it's being set to 107 and 108 now why are we doing those well we're tweaking those just so there isn't an overlap between the groupings so in this case we' bumped it one Higher and one lower and once we've completed that there's only one more thing to do and that's to change the group assignment on the proxmox host for the render and video group so you saw that in the first thing that I did but I'll show you how to run that now and it's similar to how we use a local user and add it to the docker group so that we don't need to put in things like pseudo each time so we need to do the following command we need to user modify AG for the render and video group and we need to add the root user this will allow the root within the lxc access to the host's device and once you've added that you'll be in the position that I'm at on the screen here but I'll now walk you through setting up a test GPU o4 and put this all into practice and hopefully by the end of it we've got another instance of an lxc in an unprivileged mode having GPU Hardware acceleration so let's create a new CT we'll click the wizard up here and I go through this in more detail in my previous video and I call this one test GPU 04 unprivileged is ticked exactly what we want I'll give it a password I'll load my SSH key file I'll hit next crucially I'm going to use this Debian 12 because in my last video remember I mentioned that proxmox the current version I'm using is using Deb 12 and it's a good idea to match up the lxc with the host I'm going to give this one just 20 gigs for testing but obviously change this to whatever you need CPU I'm just giving it two cores and I'm limiting it to two cores as well memory I'm just going to give this one 3 gigs and I'm going to give it a DHCP just so it gets assigned an IP address automatically everything else is going to stay stay the same DNS will use the host settings everything on there looks fine so I'm now going to finish this if you want to see a slower walk through of all those steps do check out my past video so now that's created if we do an LS we should see now 104 yeah we do great so we basically now need to copy the config that I had from the previous machine and paste it into this machine and remember all of this documentation is available on the GitHub so don't be off the screen so I'm going to look into that file now and I'm going to add the lines that we need that I described earlier and now the important thing here is that you focus on these groups here so in this instance it's gone to 28 which is my igpu now you can leave that if you're only using the igpu again as I mentioned just make sure that these IDs match up and you can check those using the command I detailed earlier in this instance though I know that my dedicated GPU my arc A3 80 is actually 129 and that's how I've got this configured if I wanted to I could change this to use my igpu maybe I will do that in the future but it also means that I could fully use the igpu for something else so this all looks good now so let's do a contr o to save that and we should be in a position now to spin this up so coming out of there I'm going to click on this container I'm going to click Start and fingers crossed I should be greeted with the log page if all of this went correctly yes I am so let's log in and that looks good you'll know if something's wrong because typically if you try to do this Hardware sharing and you haven't got the permissions in place when you launch an unprivileged container it will fail and you'll see something in the prox boox logs which shows that error basically go back double check your permissions and then hopefully when you start it you should go through this this process and so if I run the command to list the devices you can see here that render d129 is available which is my GPU and now let's just double check using lspci whether the device is available I'll need to install that first because it doesn't ship with devian 12 so I'll do an apt update just to refresh all the repositories I'll do an apt install PCI utils now that that's installed I should be able to do lspci and yeah you can see here all of the devices from the host and importantly you can see the VGA compatible Intel Corporation down here which is my A380 so now because I've already set up Docker before I'm going to install Docker and I'm actually going to install Jelly fin as a container so that's a containerized app running on Docker inside a containerized operating system it's getting pretty complicated but the beauty of that situation is we can use our existing setup that I've already shown in previous videos and we can use it here with all the benefits of the lxc so I'm going to install Docker and I'll see you on the other side so now docker's installing the last thing we need to do is grab our Docker composed file which I've used in previous videos and just make a few tweaks that will be specific to this setup and thankfully all we're going to need to do is to change the groups that we need to add to this and the hardware IDs other than that everything should stay the same I'm not going to be running this with a traffic proxy just for Simplicity but the labels will work exactly as before especially if you're also going to be running traffic in this dockerized lxc so now dock is completed let's hop into VSS code and let's have a look at the docker compos file to get this up and running and so having a quick run through the script again I'll call out the Salient Parts I've already been through jelly fin a number of times so go and check out my videos for both Docker and kubernetes if you want to know more but let's have a quick look so firstly I'm going to run this as a root because we gave the root user the access to the drive so deleting that you could add a user in there if you wanted to and in that first step that I showed you right at the beginning of the video we'd have to make sure that whatever user we specify within this container is the one that's mapped to the hosts groupings I.E we put root if we want a jelly fin user we'd have to add jelly fin there next is this group AD so let's run this command here which will tell us what we need to put in here so this is set by default to be 128 but we're not going to use 128 we're going to use 129 that's because 129 is my dedicated GPU and 128 is the igpu so back over on the machine Let's paste in this command and let's see what it says it says 107 so hopping back now into to vs code I'm going to change this to 107 so now we've got the right group assigned all I've done here is I've tweaked the locations just to be do slash so that means is going to put it as a subdirectory in the docker compos folder you can obviously change that to whatever you want I'm just doing that for Simplicity in this demonstration and then I've also created a folder just on the root directory of Slash films and I'm going to put in/ films and I'm going to download the book Bunny video next because I'm not going to run this on this demonstration behind a reverse proxy I've just opened up the ports finally I'm doing the actual device pass through and so here I've passed through d129 d129 again that's my dedicated GPU by default the 128 would be my igpu so if that's what you've got you can put the 128 in there now that that's all done we can copy that over to our LX see you can obviously remote connect I just haven't bothered to do that with all the firewalls yet but once you're back on the other host we'll create the folder structure and get that file in place so I'm going to paste all of that in I'm going to save it I'm going to come out of there and then as you can see the docker compose is in my/ doer compose jellyfin folder I'm now going to download the film to my films directory this is going to take me a little while so I'll see you on the other side but once this is done we should be ready to fire up the container and hopefully we've got working transcoding in an unprivileged lxc so now that's finished let's move back to our home directory and let's run this command and we can do a Docker compose up- D and hopefully that will pull there won't be any errors and we'll be able to access this on the IP of the lxc I'll show you that in a moment and so now that's completed Let's do an IPA and I'm going to connect to 7143 on Port 8096 so hit and return on the IP address excellent we're into jelly fit so let's quickly walk through I'm just going to use root I'm going to add the media library so I'm going to add movies add the folder hopefully film should be there yes it is okay next next next finish let's log in and we've already got big book bunny excellent but by default there's one thing we need to change heading into the dashboard go to the playback Hardware acceleration I want quick sync because this is an Intel GPU I'm going to turn on all of these because my latest A380 supports all of this stuff there is some additional stuff here for low power this won't work by default we didn't enable this I might cover that in a later video if you guys want to see it I will allow encoding an hvc because it's much more efficient and then I'm going to hit save yep that said save so now hopefully we should be able to click on this great if that didn't work we would have got a format error and if I hit the Cog now let's put it down to something like 480p great that's still working if I duplicate this tab I go into the dashboard hit the I you'll see that it's transcoding at 387 FPS and if I dive back into proxmox and if we click on the actual host itself you'll see that it's only using 177% of the CPU which clearly isn't being used for transcoding because this would Max it out so it is using the GPU and so now here we have another unprivileged lxc with hardware transcoding and I could demonstrate by firing up all four of these machines here all four of these lxs and we should be able to get parallel transcoding all using one GPU and now if you're wondering what that looks like here you can see I've got 7143 127 141 and 142 those are the respective four lxc and I've SE simply opened up the same one below it just to see the dashboard overview and let's hit go so replay on that one this one's automatically set to transcode let's have a look transcoding let's play this one this one's set to again 480p let's have a look that's transcoding let's play this one from the start this one's transcoding let's have a look on the dashboard this one's transcoding let's play this one set to trans code and then if we hit I all four of those are transcoding in real time and as you can see there's no stuttering no buffering we've successfully shared one GPU between four lxs and they're all unprivileged if you're wondering what CPU usage looks like we're currently at about 50 to 60% CPU usage so bear in mind this is an ancient CPU but maybe we get another four maybe another five or six transcodes going so 10 in total before this starts to chug but considering it's 100 pound GPU that's pretty good value and you can see that once it's actually done the buffering I.E that initial bit of rendering of the transcode the CPU usage has dropped all the way back down to less than 1% and all of these videos are still playing which you can see here in the background so thanks for watching everybody this was a lot of fun to put together and it actually solves one of the issues that I've been struggling with for a while whereby I wanted the most efficient way to share a single GPU my kubernetes was getting complex because it was one GPU per VM per pod which just doesn't scale very well I'm now seriously considering rep putting everything into lxc in an unprivileged fashion because I think that's a happy medium it's going to give me a GPU to sh between all of my nodes and it means that I don't have extra expense in the initial outlay of new hardware plus the operating cost and all the green credentials that come with it so let me know what you think about this is an lxc on privilege with Hardware pass through going to be something that you use how are you going to use it let me know anyway if you like this video give it a thumbs up hit that subscribe button and I'll see you on the next one take care [Music] everybody
Info
Channel: Jim's Garage
Views: 22,972
Rating: undefined out of 5
Keywords: proxmox lxc, linux container, container in linux, proxmox, proxmox install, proxmox tutorial, virtual machine, linux, debian, ubuntu, proxmox lxc guide, lxc vs docker, lxc container, docker vs lxc, lxc tutorial, pihole, pihole setup, setup pihole, pihole proxmox, lxc vs vm, what is lxc, jellyfin, docker, lxc gpu, lxc hardware, lxc passthrough, lxc, lxc guide, lxc jellyfin, lxc docker, homelab
Id: 0ZDr5h52OOE
Channel Id: undefined
Length: 25min 58sec (1558 seconds)
Published: Tue Feb 06 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.