GPU Pass-Through - VFIO - Let's run all the GPUs together!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
vfio is back and on the level one Linux Channel and I want you to help us out [Music] so we've got some really amazing news this velocity micro system is a threadripper Pro system and it is based on velocity micros workstation platform they've agreed to let me borrow this system to use it for a little bit of mad science here which is great if you want to check out this system there's a full review of it definitely do that this is based around the ASRock Creator wrx80 it has fast ddr4 memory it's a really solid platform this is an impressive lightweight enclosure that they have it in you know it's a it sort of feels like the old school Lee and Lee type case it's got good airflow it's pretty good design it's a pretty solid workstation we're going to be using Linux on it and we're going to be using a whole host of six and seven thousand series gpus for vfio because there there might be some reset problems and things like that but it's actually a more complicated situation with vfio and pass-through and everything else like that but first you may be wondering what the heck is vfio vfio or GPU pass-through as it's sometimes called is it something that I've been working on for literally a decade on and off I think it's the future of computing it is very very important uh to be able to run pretty much anything you want on your computer and to be able to reach into the past and run things uh in the past when this video is super old and we're talking about all this nice new modern stuff as if it is vintage and Antiquated you see right now you can do some cool stuff like you can go to the internet archive and it does some machine Hardware emulation and then you can boot up Windows 3.1 or os2 warp or whatever in your browser that's really cool stuff but there's a lot that we have right now on our existing machines that is Legacy from the past and in order to support Legacy from the past it sort of slows down modern Computing enter vfio or virtualization I'm sure that you've heard of virtual machines you run a virtual machine and anything in the virtual machine is kind of sandboxed and you can emulate whatever you need 32-bit applications running in a 64-bit context okay that's not virtualization at all the platform supports that but there is talk of removing some of the 32-bit extensions at which time you'll need to do some software magic in order to make that work vfio takes that a step further it lets you access real physical Hardware namely gpus but also networking cards and everything else inside a virtual machine so you could have a computer within a computer as a software construct that can access real physical Hardware now the Holy Grail is we get to a future where you can run multiple operating systems multiple complete software packages on a single GPU and we're there in the Enterprise but we're not really there as far as desktop goes and the utility of that is really undersold and in fact I've been doing this so long and beating this drum for so long that there are detractors that just don't see the bigger picture and don't see the immediate utility of this it's like I can't see immediately what step two is but the reality is if we have this then people will go nuts doing all kinds of really creative imaginative things that we haven't even thought of the same way that happened with virtualization extensions you see almost before we had multi-core CPUs Intel and other companies added extensions to their CPUs to help with virtualization actually Intel ran into a lot of problems moving from you know the x86 architecture to the 286 architecture to the 386 architecture the move from 16 bits in your computer to 32 bits in your computer with the i386 a little bit of a headache and there were extensions that were added that helped processes uh uh think of their memory and address space um sort of monolithically without considering what the entire rest of the system does and there's a little bit of overlap with how that's handled as far as virtual machines go but with this technology you can run Linux in a sandbox under Windows you can run Windows in a sandbox under Linux you can run Windows and os2 warp and Linux in a sandbox under something else you could have something that hasn't been invented yet that's managing and supervising all of this I think this is the future of computing one because trust and safety and two because Computing is so ubiquitous to all of our lives that this level of isolation between our applications is is going to be necessary as we've seen time and time again you know commercial companies can't be trusted to respect your privacy and respect the data on your machine there's always going to be the temptation to mine your data or mine your location or mine whatever depending on whatever it is if you sandbox those applications inside virtual machines it gives you a layer of security and protection and everything else that is very desirable as a computer user As you move your data around it's like I want to literally isolate all of my banking stuff behind encryption and firewall it's the only tool that we have in our toolbox for safe and secure Computing and we we keep not learning that lesson over and over again but that's you know sort of Grand Vision type stuff and all I'm really concerned about right now is just being able to play games and do fun interesting stuff with a GPU in a virtual machine and Linux and that is what vfio is that is what vfio gives us so to recap what would be really amazing to have is when you've got a really amazing fire breathing GPU like the 6950 XT whether that's from XFX or ASRock or power color or AMD themselves or Sapphire or whoever that this fire breathing GPU I can use it in Linux and I can use it in Windows at the same time I could have a machine learning job running at the same time and I'm doing transcoding at the same time that I'm doing gaming gpus are not really good at multitasking they're not really good at juggling a lot of different things going on in the background there's there have been stories about fun screw-ups like that like Apex Legends was using the GPU so so completely that when you wanted to do streaming with Apex Legends it would cause very bad things to happen as far as streaming goes because there wasn't enough GPU horsepower available to do both streaming and playing the game which is silly that was just bad programming it wasn't even an architectural thing and so I think that extensions for Hardware like this should be as ubiquitous as the extensions that have been in our CPU since time immemorial but here we are in 2023 and from my point of view it seems like it has taken the entire lifetime of computing to have that feature in consumer grade stuff so while we wait on that capability to be able to run multiple operating systems at the same time under a single GPU we can add a second GPU to our system which is pretty awesome and this first video is really kind of a call to Arms have you been using vfio do you use vfio would you benefit from the ability to run multiple operating systems seamlessly and with low headache and assuming there are no GPU reset bugs and this is an actual tested use case then now is the time to make our voices heard not just because I'm doing a cool giveaway with AMD yeah it turns out that AMD and Corsair are sponsoring another video that is going to be about this where we're going to give away a 7950x based system that I build around Corsair components and this XFX 6950xt and some other GP be used in some other hardware and some we'll cover that video don't worry but you're going to have to look at the level one text forums for some details about that and this whole project because this is going to be a pretty big Monumental thing I've already started a new how-to so if you have been following the vfio space for a while you know that generally the 6000 series cards are pretty solid and the 7000 series cards have been a little bit more of a headache for GPU pass through and virtualization and and that sort of thing the reality though is that those cards don't have the same kind of reset and other bugs that we experienced with Vega there are some rough edges around this use case but generally it is possible to work around those rough edges on the 7000 series gpus at least on the AMD platform but I am at the time I'm shooting this video because this is pretty early in the process I think I can work around it on the Intel side as well so what that means is that for 7000 series GPU use most of the problems that you will trip over with those gpus in this use case are really down to the platform and configuration you see PCI Express is undergoing a lot of changes under the hood in order to be able to support PCI Express 5 and things like disaggregation this is a new cool Enterprise technology where your your compute and all of the pcie lanes that your CPUs provide in your this cloud of compute in your data center is connected via PCI Express to literally everything else like we just issue the networking technology and everything else and literally just have everything talked to everything else over PCI Express not just peripherals but also mixing and matching so this cloud of compute and a cloud of storage and a cloud of gpus are all interconnected on this huge PCI Express Fabric and the fabric controller says okay this cluster of CPUs needs these gpus and these storage devices and these network cards down here and that really works well uh see also what companies like liquid are doing and everybody else and so in building out those use cases and building out those use cases inside supercomputers that are you know being deployed as fast as they can be built it's trickling down little changes that create some rough edges for us to deal with and that's why we're looking at this this kind of thing so I started a how-to on the level one text Forum that goes through some of this and the bio settings that you're going to want to look for if you're doing this so if you're having problems with the 7000 series reset now is the time to either Point me toward somebody that has a reproducible issue with the 7000 series GPU because I also have the ear of AMD a little bit here and so again make your voice heard vfio virtualization all this this background the reason I'm giving you this background video on the level one Linux Channel second thing if it's working for you let us know and let us know what is working for you and your BIOS version and everything else things like uh pcie AER and making sure iommu is enabled and not on auto and not using things like um the pcie ACs override those things can get you into trouble with where we are in 2023 with Modern Hardware like x670 and to a lesser extent b550 and x570 you should be able to configure the BIOS on most boards so that you can run all of the stuff that you need for proper iommu isolation it's not enough to just enable iommu anymore you also have to tell it that you want the advanced pcie features not every bios exposes those options not every board has those options you also have to understand if the slots on your motherboard come from the CPU like if it's a PCI Express Lanes that are connected directly to CPU or if it's PCI Express Lanes connected through the chipset I know some people don't care to run you know their host GPU through their chipset or they want to use their igpu for their virtual machine but an add-in GPU for the Linux machine everybody's got a different use case and some of those use cases are a little more tricky than other use cases and I don't know if I'm going to be able to cover all of them in the fact and the how-to but I've noticed that you know like on Reddit such rvfio given a recent Reddit kerfuffle and the acceleration of the vfio community and blah blah blah I need to do a little work here to try to bring everything back together because some of the enthusiasm has has gone away it seems and some of the documentation is getting a little bit stale so it's like okay let's bring it together let's get the documentation good to go let's let's do this early on I had some problems with 7000 series gpus and with bios updates and Giza updates and being able to load the bios for the GPU from the virtualization side instead of relying on the card's own bios which is another annoyance but something we can work around in software then there is a path forward here and this video is not the path forward this video is just called arms so don't don't get ahead of yourself be like he didn't tell me anything that was immediately actionable if you want the immediately actionable stuff you're gonna have to go to level one form it's not gonna be in a video format I'm sorry but we're going to talk about cool stuff that we're doing and also reminder Jeff Looking Glass the Looking Glass Project has phenomenal updates from the last little bit signed drivers for monitor emulation and the monitor driver actually does the uh the memory buffer pass through from your host from your guest to your host yeah you can it's it's a thing almost pretty much sort of kind of where you load the Looking Glass driver for the monitor the virtualized Monitor and that is able to scrape the frame buffer and forward it to the host which is the coolest thing ever at ridiculous speeds with pointer sync and the best performance that you've ever seen on Modern platforms again this is why we're going to give away a 16 core monster system with two gpus two two terabyte hard drives thanks to Corsair plus also the 512 gigabyte operating system SSD plus the memory it's basically the the ultimate system from Corsair and AMD partners like XFX ASRock and some others so very very exciting stuff look for that thread in the level one Forum linked below for this community project that we're putting together I want to get my ducks in a row with vfio I want to hear how you're using it so on and so forth I myself have been using my vfio system based around threadripper Pro it's a it's a build system very similar to this I did that wood grain video like years ago and it has been Rock Solid it has done everything that I've thrown at it I've passed through different kinds of gpus I've swapped my gpus around it has been absolutely Rock Solid I even did the old how to's where you can run your your Microsoft Office applications seamlessly I've been using that for years and it works great and it's awesome and I really do think it's the future of computing and also being able to separate my concerns in kind of a quasi-secure way secure ish and philosophically I like being there definitely not as secure as something purpose-built like the cubes project but philosophically it's kind of the same way if you have never heard of Cubes you should definitely check out thecube's project and watch some of their stuff and watch maybe some of my well my old videos and cubes are pretty outdated at this point but cubes is another really fun awesome project in this in this same kind of a vein it's an offshoot but let's not get distracted by that let's stay focused on making the vfio experience a truly first class experience again because we were so close so close and also I'll give you one other tidbit these Arc gpus the a770 for example which is becoming a first class experience in Linux I've had this for a while it has not been super fabulous in Linux but this GPU has come a long way on the Windows side of things in China there is a a hack let's call it that will turn the a770 into its data center equivalent which supports a multi-tenant GPU technology called sriov so it is possible to turn an a770 into a two tenant sriov where each one has eight gigabytes of memory on a 16 gigabyte card so you can run this GPU with a guest operating system giving it eight gigabytes of vram and you run your virtual machine and your host operating system off of a single GPU and that is possible with the a770 but you do have to modify the card and a lot of these cards I mean they've really that's these have almost 99.8 percent complete single root i o virtualization support and I'm trying to figure out how I can show you how to do that without tripping over some landmines because there's some landmines there and that's that's all I can say maybe I maybe just telling you that is enough that somebody can figure it out and connect the dots I don't know we'll see but an a770 with sriov support it may turn out that Intel is the first company to offer this intentionally or unintentionally actually I think there are beta versions of that card that have srov enabled I'm not sure if that was accidental or not or just leftovers from the fact that the original the commercial version of that card was built for I guess 10 cents or somebody to be able to play mobile games online or something like that there's a there's somebody in our community that worked on that project apparently and sent me some info so that's sort of interesting uh but that's a story for another day I'm one of those level one vfio is back level one Linux hello and welcome we're gonna get this done we're gonna put in a lot of work and it's gonna be really something thanks again to velocity micro for sending the system out I've got a link below you should check out if you are interested in just you know point and click and buy a workstation and get a workstation you know you don't want to DIY it you can spend less money if you DIY but if you're buying it for work you want to support contract it's cheaper than Dell HP Lenovo Etc and it is a truly a workstation because threadripper Pro thread Pro is a workstation platform the 7950x is an Enthusiast platform with two gpus and we're going to give that one away so check that out that's and that's going to be something that I build but if you want something that has 32 or 64 cores and a ridiculous amount of horsepower the dripper Pro is still killing it even though these systems are a couple years old this is the fastest workstation CPU you get out of the box I'm model this is level one I'm signing out and you can find me in the level one forums working endlessly on this vf5 thing oh I've already put in a scary amount of work this video was supposed to be out kind of a while ago sorry I'm working on it things are improving all right I'm signing out I'll see you there [Music] [Music]
Info
Channel: Level1Linux
Views: 35,814
Rating: undefined out of 5
Keywords: technology, science, design, ux, computers, linux, software, programming, level1, l1, level one, l1Linux, Level1Linux
Id: HO_8liPirns
Channel Id: undefined
Length: 19min 29sec (1169 seconds)
Published: Thu Aug 17 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.