How to Set Up SR-IOV with Intel Flex 170

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] good news everyone at least everyone that's using Linux although you can use windows with it too Sr iov is here on basically everything from Intel almost sort of kind of pretty much this is a portent of things to come srov finally entering the mainstream what is srov it stands for single root iio virtualization but basically it just means that you have a single piece of hardware and there's a way to share that single piece of Hardware among several virtual machines why do you want to share a single piece of Hardware among virtual machines well because you can do multiple way different things with a single piece of Hardware without having to reboot or change your operating system you can run the cool Windows stuff and the cool Linux stuff on a single piece of hardware at the same time it also has lower overhead than the pair of virtualization think a network card if you run a virtual machine and it needs networking all virtual machines need networking then you can emulate a network card in software which has a lot of overhead and is not optimal for a lot of reasons or your network card can have functions that allow it to be passed through to a virtual machine like you would pass through physical Hardware well the Nick in the minis Forum ms01 the Intel Nick the 10 gig SFP plus it has srov enabled and so you can do exactly that with networ working but also Intel on their igpus it's not officially supported but you can enable it anyway and the Intel igpu not a lot of horsepower at least not today in 13th 12th and 14th generation Intel igpu but maybe someday we will see a nck that has some reasonable onboard Graphics horsepower actually that's kind of a hint of things to come we also have the a770 a consumer class graphics card that is in the superposition of both having and not supporting single root IO virtualization meaning that an arc a770 can theoretically support multiple virtual machines at the same time kind of a Promised Land a holy grail and then at the low end of the Enterprise or not really sort of kind of there is the flex 140 and flex 170 I just recently did a video on those it's basically the same Hardware as an a770 arc GPU little different it's close though and that supports a lot of virtual functions through srov and actually non srov technologies that are specific to VMware and other virtualization platforms that make even better and more efficient use of that but that one physical piece of Hardware can be shared among a bunch of virtual machines and it actually does accelerate things and and and makes it go better in fact if you use a remote desktop machine or sometimes it's called virtual desktops in the Enterprise that has Hardware acceleration and then they ask you to go back to one that doesn't have have one you'll be very upset because once you've used it and you experience how awesome it is you won't want to go back so this video is to walk you through setting up srov not on your Flex 140 or Flex 170 but on your Intel igpu which is not officially supported and maybe some other Hardware in the not too distant future waiting on kernel 6.8 some things are supposed to align in terms of firmware and Driver functionality and the magical Promised Land of Colonel 6.8 we're not there yet but we are going to take a look at Colonel 6.5 on prox MOX 8.1 for this video I'm using the Min Forum ms01 I actually have it sitting on a desk this is just the case for it I don't have more than one so I it's running in the background here but this would apply just as easily to any 12th 13th or 14th generation system remember azrock desk meet this is a perfectly reasonable Home Server solution as well it's not like the mini form ms01 it doesn't have built-in 10 gig and it's a little tricky to add it you can do the m.2 cheat codes did a video on that where you can add peripherals via the m.2 this is meant to be just a work Workhorse business machine but it's inexpensive and it's very powerful it's LGA full fat LGA 1700 so you can run a 65 watt processor in here uh single discret double slot GPU this would be good for like the machine learning stuff though like if you wanted to run the RTX 4000 in a machine like this oh yeah that work out really well a little Overkill but like this will do srov as well let's get to the the how-to on the Forum all right first things first before you do anything go into your hardware and you got to make sure that Sr I is enabled in the Bios got also make sure that vtd is enabled you can start virtual machines I'm assuming that you don't need a tutorial on how to install proxmox there's a tutorial for that on the Forum that we've done ages ago tons of people have done prox MOX installation tutorials you want to get that set up on your Intel machine that you're going to run this on far as I know nothing like this is supported on on AMD except for firepro like ancient firepro graphics cards uh there are some Instinct cards you can get this working on but you have to go uh download drivers that were only ever released to Amazon and you'll probably need vendor reset because the Jeep the cards don't reset properly I've I've emailed them they don't I don't know I don't know that it's really on amd's radar to worry about such things but you want to get virtualization setup and running and lspci make sure that you can see your iris XE Graphics if you already see more than one Iris XE Graphics congratulations somehow you've teleported into the future where this is just something that works out of the box it's probably not going to be like that though all right the second thing to understand is that there is this git repository called the I 915 backports on GitHub this is the source where Intel is doing a lot of their work now on this repository don't make trouble here because Intel is uh not wanting to be bothered with these off Lael use cases and they don't have a huge number of people working on this and the fact that srov works as well as it does is a little bit of a miracle because the business people inside of Intel seem to have different ideas than the engineering people and so you can see this incredible uh passive aggressive War being fought between leadership not understanding what this functionality is all about and the engineers understanding it and getting it and trying to get this out there so that the people see how awesome they're doing and Intel really is doing awesome with srov especially when we talk about Flex 170 and flex uh 140 uh ponio I'm not sure about got to get my hands on P ponio I borrowed remote access to a ptoo system for a little while and the inferencing performance seemed like it was off the chart but other performance was not as impressive that could just be me that could just be one API not coming together but srov and what Intel is doing here their whole driver Universe for XE graphics and Arc and flex 170 looks like they're bringing that all under one repository under one house which is great from an architectural standpoint because you don't get into weird situations the way that we see with other companies where the left hand has no idea what the right hand is doing and the left and the right hand assume that uh everything that needs to happen has happened when in fact nothing that needs to happen actually happens which is frustrating anyway GitHub the driver everything going on with GitHub this is a great repository it's a great resource you should look at the commits and the the diffs there's really not a lot going on in terms of pull requests there is actually one poll request that fixes some null pointers that hasn't been merged in and the reason it hasn't been merged in is because it fixes null pointer problems with the igpu unlike Haswell generation I think anything older than 12th gen is going to be problematic going forward looking at Colonel 6.8 and the driver architecture especially for these off label uses so maybe keep that in mind if you're thinking about this for for older stuff but the the null pointers is a thing that that I experienced on 12th gen that those patches can take care of when you're using srov now srov like I say on the igpu is not officially supported but if you uh download this repository and you check out backport main you can actually compile this on colel 6.5 if you're on like proxmox 8.0 it's an older 6.2 kernel it's not even going to compile cleanly like it's a little bit of a problem but the Intel Intel Engineers really are putting a lot of work into supporting kernel 6.5 and because they're putting a lot of work into supporting kernel 6.5 on their commercial prod products it spills over into the igpu stuff which is great for us and also technically also the a770 but I haven't shown you that yet well actually somebody in China beat me to that but it's a story for another day so there's a lot of good stuff in this repository the problem is that some of the good stuff for igpu um is not allowed to exist in this repository directly or indirectly because of the infighting and everything there one of the best GitHub repositories on GitHub for the igpu is from strong time zone uh strong bad's lesser known cousin strong TZ um so check out that get repository and those colel updates and this is specifically for proxmox these are proxmox users that are already using it and they're they're doing the Lord's work in terms of dotting the eyes and crossing the te's and really getting this up and running with Sr now no one is actually going to use an igpu for Sr iov uh to support actual end users like it's too unstable you get code 43 you get you know like this system stable for like 10 hours and then it reboots and other bad things happen but for running your home proxmox server or being able to run Docker containers or being able to run Ubunto yeah it's great it works fine so in the how-to that I put together on the level one form it's basically okay start with the I95 backports the official Intel repository here are the steps that you go through on the Forum to get it up and running but you should know you should be aware that that's not necessarily going to solve your problem and it's not meant to when you're using it with an igpu on like you know the the Min Forum ms01 or or your desk meet or whatever it is you're trying to get running on your igpu and not not every system supports srov like you're not even going to have the BIOS option to enable it necessarily so strong time zone or strong TZ uh is is managing that chaos for us and that is also a pretty good starting point now when you when you do clone this and you do use it it's still a little bit of an uphill battle you have to know a little bit about software development like Intel I 915 backports at the time that I'm doing this there's a bug in their make file make file. backports they're using tabs instead of spaces uh when it's doing some compatability checks I just comment out those lines that's in the how-to on the level one forum and then it will build the dkms module that will let you test it firmware firmware is also an issue proxmox doesn't include the I 915 firmware version that you need need the guck uh uh for this and if you're doing off Lael stuff with the a770 or anything like that you also need firmware that's not even necessarily going to be in the Linux firmware repository but the drivers look for those versions so I ended up not only having to download a firmware version from the Linux kernel website but also having to create a Sim link for the specific firmware version it was looking for now if you look in the get commit history for that uh firmware file version it actually was the specific version that it was looking for and even a newer version it was like13 to20 and20 was from February of 20243 was from like October of 2023 and the driver was looking for that version of the firmware but you'll have to pull that and you'll have to put that in user lib firmware i915 there's also another thing going on here that is I 915 is being supplanted by XE and so there's sort of a branch within a branch in the driver development here that is xe graphics and technically these 12 13 14 gen they want to refer to them as XE Graphics even though they're they're closer to UHD 770 than they are what we see in like the a770 GPU but there are future embedded notebook Etc gpus that really do have arc graphics and XE Graphics something a of a lot better than what we have with the embedded graphics on on these CPUs um that will be more useful if we can get you know slice those into Sr iovs because when we're talking about the igpu on a 13th gen it's like a slice of bread just the one being subdivided into up to seven virtual functions and so that is a paper thin slice of resources that we're talking about here it's basically just good for your Plex Media Server so understand that and understand that's what's going on here if you have the physical room for it it'd be better to just get another GPU and pass through the whole physical GPU to your virtual machine using proxmox standard facilities once you get the repository set up and everything downloaded then the next thing that you do is enable the virtual functions you can you know Echo four or seven into the srov num VFS number virtual functions when you do that and then you run lspci you'll see a number of virtual functions there that correspond to whatever you told it so like there I've enabled four virtual functions the the root and the0 are like the host part of the graphics don't add those to your your prox MOX configuration then you've got one two three and four the easiest way to do this is under Data Center and a resource mapping and then add and then you should see that in your list the iommu groups column sort of gives you a clue here I've got two Raptor Lake igpu devices in group zero we're not going to do any with those but we're going to check the other four for this mapping device that we create and this is the most convenient way to deal with PCI pass through in this context like this is like how you would do it for real if you had a flex 170 GPU with that you should have something that looks like this in your configuration and then when you go to a virtual machine and hardware and add and pick pcie device you have this radio button map device and you can pick whatever you want from map devices I call this group of gpus vgpus and this will automatically pull one from the pool whenever you have a VM that starts this is really convenient because you can have a mix of Linux and windows VMS even more VMS than you have Hardware resources available and as long as you don't have more than four or seven or whatever on at a at a given time proxmox will automatically manage mapping and unmapping the devices to whatever virtual machine needs it this a pretty cool feature real world because this is an igpu and it's not a lot of resources all this is really good for is doing experiments and P passing through functions to like say Docker to be able to run an experiment with Docker or like I say Plex Media Server media transcode media incode quicksync has a few other Hoops that you have to jump through usually but it depends on what you do in the guest operating system and the guest operating system for this is not necessarily going to support this out of the box so like if you pass this through and you're going to do a fresh installation of say Ubuntu that you're going to put Docker on in order to be able to run you know Plex or jelly fin or whatever in a Docker container you're not necessarily going to get console output on this device even though you passed it through it does actually work like that with the flex 170 but these ipus devices are basically incomplete not fully baked support and so you end up having to use a spice GPU or multiple gpus and then you set one up and then you do that on the other side and that's what I did in the windows VM here so if you look at our Windows VM in device manager sometimes when you reboot your windows VM the iris XE Graphics is showing code 43 you can usually get that to go away by directly modifying your qmu configuration in proxmox to hide the vendor ID and enable nested virtualization and uh relaxed hyperv stuff I don't know if it's a driver check or just driver bugs but that's what I had to do in order to get the Intel XE drivers to install inside the guest whereas on the flex 170 system it was painless it just just worked it's pretty great interestingly Windows update will pick up the iris XE Graphics automatically whereas with the flex 170 it was a manual install it also works for you to change the vendor ID so like if you just change the PCI device ID using your qmu configuration on the Linux side uh you can get a little bit better compatibility it will work around some of the driver issues which is a really interesting situation I think I am sure that Intel is not testing that use case so your mileage is going to vary there too but it works at the time that I'm shooting this video driver version 5186 seems to work okay for this use case and strong tz's igpu I 915 repository Works a little better than the I 915 backports uh repository using the backports SL main branch for what that's worth if you use the main SL backport main branch of the i915 official Intel repository you're likely to encounter null pointer D referencing but if you apply that patch from the issue tracker and a couple other uh do a couple other things you can get the i915 back ports thing working even with the v20 firmware from the Linux kernel website but most people don't care most people just want a home appliance to be able to run home assistant and flex media server or jelly fan or something like that and the hardware transcoding facilities quicksync basically uh that is in your igpu is more than enough for most people to run a Home Server use case so that's really one of the widest use cases for this kind of things this is more speaking to the maybe the the the Intel people that would be a little hesitant on the on the suit and executive side most of the people that want this functionality just want to be able to have a single machine that can run Docker and Linux and Windows Virtual machines and a couple other things and the igpu doesn't have enough horsepower for that to even be a threat or worry now does it make sense to dump for Intel to dump a bunch of engineering resources into supporting that use case no because it's not a lot of GPU to begin with no one should be using this commercially no one should be tempted to use this commercially and I assure you from having to go through the crap in the GitHub repository and having to rely on folks like TZ no one will rely on it commercially but if we have a gaming GPU that has two or four virtual functions that can be used for this kind of use case then it opens up a whole new world of possibilities not just for those home lab use cases but interesting things on Windows look at the windows subsystem for Linux it's taken off like there's more people using you know the year of the Linux desktop Linux under Windows which is definitely a suboptimal Linux experience then there are people running natively look at Nvidia nvidia's Cuda stuff runs better under the windows subsystem for Linux and it does natively on Windows to the extent that Nvidia is basically not doing native Windows development anymore okay there's some Hardware driver stub stuff that can't not be on Windows but I'm talking about like the control plane stuff in the automation it's a very light wrapper to an API that you get on Windows and there is no windows gooey there is no windows equivalent because it's better and it makes more sense to just run the windows subsystem for Linux for those facilities this could be the same with an Intel GPU this could be the same if you're running P torch and passing something through this is already largely the same for containers and Intel had a good start with gvg for this kind of thing Intel had a great start with this on their you know 1230 V3 and V4 has as well era xeons to be able to support this kind of a use case and so in some ways Intel has regressed a little bit because all of those things did not align as much as Intel's Graphics stack aligns now and so I think everybody really is interested in this kind of functionality for a whole bunch of different reasons but ultimately the time for the pair of virtualized GPU has long since been upon us and really does need to be a thing and it is amazing that no one in the industry has managed to get this right in a userfriendly kind of a way it's uh it's all about uh lock in and proprietary standards and and things that aren't amazing and to be sure srov is not an amazing technology it's it actually has a lot of shortcomings which is why uh the flex 170 uses something different optionally with VMware and the thing that you can do on VMware to get a whole bunch of seats on a single GPU uh spefic specifically with the flex 170 is far superior to what you get with Sr iov but Sr iov is open enough mostly and standard enough mostly that it's the accessibility that's the draw there so hopefully you get some mileage out of my tutorial check that out on the level one check that out on the on the level one Forum let me know what you're on into let's improve the the the tutorial and and go from there but proxmox officially supporting Flex 140 and flex 170 and yeah you actually can get a flex 140 and a flex 170 from Dell and super micro and everybody like that street price I'm seeing between $1,600 and $2,800 us give or take on that Flex 170 a flex 140 is two lesser clocked lower vram uh gpus on one card half height half length so it's a little different Critter but Flex 140 Flex 170 it's a real deal works really well oh and if you do need to change virtual functions and other configuration stuff there's a separate thing that you need to know on the level one forums there's like you have to you need a CSV file that contains all the parameters that you have to send to the card to change the number of virtual functions and a whole bunch of other stuff I'm probably going to post that on the Forum you can kind of sort of well it's it's there already but I need to post it publicly I just need to like if you look at the I 915 backports driver you can kind of sus out how to control all that but it's really interesting because it's not actually posted anywhere so I'll probably post that on the level one Forum too CU you can configure like it gives you a rough estimate like okay if you tell the flex 170 GPU this you'll get this many virtual functions and you can roughly count on it doing you know one 4K stream or two 1080p streams or four 720p streams for remote desktop or virtual desktops or GPU acceleration or whatever it is that you're working on um I have not produced a similar mapping for the Intel igpu but that that Intel igpu really it really struggled like oh I've got seven virtual functions you're really going to struggle to get more than one or two streams of 4K going at once like it's not there's not a lot of GPU horsepower there and splitting it up that many times adds a lot of overhead so I'm this level one I'm signing out you can find me in level one forums W level one Linux all right I'll see you later n [Music]
Info
Channel: Level1Linux
Views: 22,817
Rating: undefined out of 5
Keywords: technology, science, design, ux, computers, linux, software, programming, level1, l1, level one, l1Linux, Level1Linux
Id: aYcntiF4j2Q
Channel Id: undefined
Length: 24min 40sec (1480 seconds)
Published: Mon Apr 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.