Understanding Intel's Quick Assist Technology

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
quick assist no I don't mean the quick assist thing that's built into Windows like a help Des thing I mean Intel quick assist qat Intel's betting the farm on their accelerators so let's give it an honest fair look I've got quick assist hardware and if you've got quick assist Hardware you definitely should be using it I've done some videos in the past on that but I want to show you just how easy it is to get up and running with it and explain a little bit more about why it's so important you see quick assist is a CPU feature that's outside the core it's an accelerator and it's probably going to be part of the next big Battleground in the CPU core Wars at least for Server CPUs it's completely flying under everybody's radar and qat takes an approach to data compression and decompression and encryption and decryption that really is pretty elegant in actuality and probably the best approach for dealing with large blocks of data and Intel actually has a 10-year head start with qat but are they going to blow it uh if you've come to depend on qat it's hard to give it up you know and I think Intel's product segmentation up until now has been part of why it hasn't been as widely adopted as uh maybe it should have been but I hope that Intel doesn't blow it I actually do like the technology and the technology is nice and amd's approach here is uh different than what Intel has done with quick assist and we'll talk more about that in a second but quick assist is also still not well understood because it's something that's outside the CPU core and it's a Battleground yeah it could actually be really really important for the next Generations of CPUs going forward but why why would it be a Battleground in that what sense does it make to put something on a CPU that's not a CPU it's not part of the core well that's what this video is for let's take a closer look let me show you how easy this [Music] is [Music] all right here on our system we're setting up ZFS to do compression with quick assist well quick assist can help you do compression with ZFS yeah because it's just a generic accelerator it's made for processing lots of data it's versatile and this is experimental support in ZFS hence this experiment this level of support for quick assist that's already kind of here is mindblowing and we take a look at uh uh you know virtualization distribution like proxmox and there's some really awesome options there but to start we're going to take a look at vanilla Ubuntu 22.4 LTS I installed it on our dual CPU Sapphire Rapids 8490 it has everything and a total uh of four qat accelerators per socket it's eight in this chassis engagement challenge what server CPUs do you have and do they have quick assist uh if you're not sure you can always check the Intel Arc website Intel arcb website is uh tremendously useful for figuring out what your CPU has built in and some of these can actually be unlocked with a license key or uh other uh maneuvering at a later point but uh anyway you don't actually need the top end 84 90h to get these accelerators I've got another couple systems we'll take a look at in in a little bit but for now yeah zeeon Platinum top of line kind of makes sense although topof thee line CPUs don't necessarily have it and the ZFS file system well it's handy in that it can compress data trans apparently but I'm modifying it to pass the compression part of the file system through to the qat part of the CPU wrote a guide in the level one text forum and you can follow along with this video you can look at that but there's not a lot to do basically can you do lspci and see that the device showed up okay it did and not only that if we check out LS mod and see that the kernel already has loaded the kernel module for this that means we've got the firmware and everything else the Linux firmware image in our case so the package is going to bring down the firmware images for us it's pretty awesome if your Kernel's up to date and you get the Linux firmware package and it's a reasonably recent drro the device is already there in slev at least to get started there's some more stuff you can do to add more devices and fun other things but yeah it's basically there the big part that we're missing is the qat libraries those are Intel's libraries to get stuff done with this so we'll have to download and set that up that's basically all we have to do to compile ZFS with support for qat is to get those libraries and make sure those devices exist and before you say well but modern CPUs are actually really good at software compression yes that is true and if you shelled out for an 8490 you'd hardly notice a difference between qat and non qat uh compression you know if you want to do Z standard up to level 9 you've got cores for days why don't you just do it on a core but what if this server is fully loaded well with qat it means whatever job you're running on your CPUs that's making the workload is not competing with the compression job that's part of the file system you see your CPUs can be busy doing the workload but also busy with background jobs and the more background jobs you can offload into those accelerators on the CPU the more room you'll have for your actual job that you're trying to run this is also kind of like using a GPU for acceleration you know you got movies you want to transcode them it takes forever to do it on the CPU if you can do it with the GPU uh but this is the highest in zeeon Platinum system what about something else I mentioned you don't need the 8490 how low can you go well I got this Emerald rapid system that I borrowed from Super Micro now this is not a top of Stack CPU system but this is an insanely powerful system with quick memory bust speed but it's very far from the top of Stack but it's also not quite lowend either the CPUs in here cost about a third of what the 8490 H's cost and it's almost fast enough to handle full disc encryption at 100 gigabytes per second from an array of kokia CM 7s in the front and yeah it can saturate 400 gbit ethernet with ease when we're talking about encrypted TLS connections too so you don't have to have the top of the top the system is not at all top of stack for Intel but this is the perfect system for us to bottleneck neither the disc nor the network card but what about even lower well it's sort of part of how qat came to be as well supported as it is I'll grant you that it's a lowkey but it is ubiquitous how low can you go well if we take a detour for a second and talk about atom cores atom cores can have quick assist PF sense for example a router distribution of free BSD it's really popular with home labers and it's very very popular in the commercial space uses AES ni for VPN acceleration that's a CPU core instruction set that accelerates crypto but it also has support for qat and qat is about four to six times faster especially on low-end Hardware than even aesn I modern internet routers do so much encryption and decryption that asni is basically not enough and it's a minimum requirement at this point oh and while we're on this detour how this came to be is interesting so let's go back in time a little bit further why not say atom cores I mean the original Cherry Trail atom cores intel was facing a problem we're we're talking almost a decade ago well not quite but competitors were making inroads in telecommunications and other gear that needed cryp graphic functions that take a lot of CPU horsepower they were using addin hardware and specialized chips to do that acceleration so if you want to run a cell phone tower with thousands upon thousands of connections and it's going to take monster expensive server CPUs right no you didn't really need a lot of CPU horsepower to just Shuffle data in and out from radios to fiber optics because in those days there wasn't very much that was encrypted but when somebody rolls up and pretends to be the neighboring cell phone tower or somebody wants to impersonate you on your fiber back hle well that's not very secure solution was to encrypt everything uh on both sides not just cryptography for security but cryptography to make sure that there's no tampering between the end points that are communicating compressing the communications also gives you extra bandwidth so you don't have to run wires or even increase the speed of the existing telecommunication stuff that you got this is where Intel competitors really were making some strides competitors could offer specialized Hardware to do that Assurance the security Assurance security crypto compression basically just plug in these third party chips or controllers or deploy them on a board let the x86 and the Intel stuff do the Intel stuff but you got these boxes that have specialized Hardware in x86 and that takes care of it the specialized chips would do the work and the x86 cores were just used for control and management you actually still see that in some ethernet switches today we covered Arista switches in the past it's a little tiny x86 control and like big giant broadcom chipset to do the ethernet part of it now Intel could solve this problem by giving telecoms a monster mini core CPU with something like AES ni on 56 cores or 64 cores that would do it but those customers in the Telecom industry that want compression and encryption they were not willing to pay as much for a mini core CPU as data center customers were already paying for the same CPU it's a bit of a market segmentation problem for Intel and the Telecom customers wouldn't be using most of the rest of the functionality in an x86 core anyway thus was born quick assist and accelerators like that and also kind of in parallel and at the same time uh other markets were finding creative uses for Intel E3 xeons the lowest in xeons but they're also the only zons that came with the GPU and it was very modest as far as gpus go but people were finding creative ways to have those gpus do workloads and that's sort of the environment that quick assist was born in I mean quick assist paired with modest and I do mean very modest x86 cores could encrypt and decrypt at 10 gbit wire speed and that's when 10 GB was blisteringly fast without quick assist those cores would never have been able to manage 10 gbit probably not even 2.5 gbit probably more like 700 megabit hey wait wait a minute that's how fast my PF sense router is at home could do on the VPN before AES ni support became mandatory yeah we'll get to that if you have home lab or you have experience with pfSense in the commercial setting you've probably discovered with your repurpose old Hardware that without aesn it really can't do certain things really super fast now you know why because you need Hardware Assist quick assist does kind of work more like a video en code like a GPU does than the way that it would with CPU instructions like having a CPU core do the work but that's probably a story for another another day I've rambled too much on this this is our 8490 that we're remoted into and just out of the box you've got the one ADF control so looking pretty good qat 4xxx there's different versions of qat that support different algorithms and they also run at different speeds it could not possibly be more confusing you've seen the AVX 512 ven diagram of human suffering qat versions Etc kind of a little worse because we haven't set up any of the qat user Library stuff or services or followed the the guide um the qat service doesn't exist these are eight qat devices in lspci remember to check Intel Arc because you might only have one or two or four depending on your system it's also true that sometimes when you update the qat library version you get new crypto support fonics has a good article on that you should check that out basically it's a software update and it's like oh look new cryptography standards are supported with the same old qat Hardware that's pretty much it I wish it was more complicated than that but on Modern dros it almost works out of the box you just need to do a couple of steps to get the user libraries and you're good to go what's not to love can't stand the thought of all that compute just sitting there idle it could be doing stuff so qat is a way to uncore compress and encrypt uh you know we showed the the use case of engine X before you can offload TLS the web encryption and that really sped things up more than I expected I was sort of shocked by that which is why I'm doing this um and by the way this system with the device and the modules and the firmware it's ready for Docker just like the other video so if you want to just start using Docker you're you're good to go so what Zeon do you have access to use Intel Arc see if you've got qat it might be worth some experiments on your own for Docker on this system if you want to go that route instead of ZFS or both all you need to do is add a qat security group and give it access give Docker the host access to the qat devices and that's in the Intel docs but there's also some hints about that in my guide on the Forum from there you're ready to run Docker containers that can leverage qat here out of ideas for Docker containers well Intel's open visual Cloud GitHub repository has a lot of good resources that will use qat and some of the other Intel Hardware facilities and check out my other videos and Forum posts and getting you know this kind of thing set up with engine X but what about Windows I did a falcon Northwest 56 core system as the Ultimate Machine Learning System and I'm still doing that with Cuda and AMX and there is no faster machine for open Vino but this is based on a xon w CPU Zeon WS don't have quick assist and I think this is kind of a strategic Mistake by Intel they're they're putting AMX and AI in basically everything and open Veno goes fast faster than anything on this platform but qat which really would be useful from a developer standpoint even if it's an anemic implementation of qat just so I can run my local Docker containers and then I can migrate them from Dev to production I mean it's sort of a misstep but there is really good support for qat in Windows Microsoft SQL Server Glenn VAR Microsoft SQL Server MVP on his blog we've done videos in the past you can run qat Hardware acceleration with a huge benefit you've got a 16 core system that's fully loaded that's licensed by the core if you have the qat accelerator you get the backup compression for free basically because it doesn't run on a core so if you're you know your SQL Server is fully loaded and you need to run a backup your users are uh going to notice unless you're using qat Hardware in which case it's less likely that they're going to notice because the compression is not competing with the SQL operation on the on the CPU cores and so it's not fighting between the SQL data and the compression job because it's running somewhere else that's pretty awesome and the qat doesn't count against the licensing so you don't need to buy extra cores and try to cordon off some of them and uh you should just check out Glen Barry's blog posts and the videos that we've done in the past so be sure to check that out and yeah that's on Windows Server it's full support there so to take us out I mentioned this as kind of a Battleground Intel has embraced chiplet architecture they have silicon for compute and they have silicon for Io compression encryption and data transformation jobs handled by a qat type accelerator not on a core but on the IOD die off of the CPU and yeah you got all that transistor count and real estate in an IO die makes sense from a design perspective and while AMD could add similar resources like this inside their own IOD it seems like their strategy is different so for example with their pensando acquisition the same type of encryption that I showed with qat on the Intel CPU in the last video that's more equivalent to what they're doing on the pensando network card so in this case the network card is doing all the encryption and decryption offloaded from the CPU it's not an on CPU resource and that kind of makes sense from a TLS standpoint but this is a fundamentally different approach than what we see with Intel and what they're doing inside their CPUs and as good as qat is I think Intel is being very very careful to segment their products uh and that has probably cost them some adoption of this kind of Technology can an x86 core get good enough to handle wire speed 100 gbit encryption well so far it hasn't it it needs some type of acceleration to do this you're going to burn all of your CPU overhead trying to handle very small packets and encryption when that is really better served by a an IC that's built for the job now to be sure designing a CPU it's a balancing act where uh you know you've only got so much real estate so many transistors and Intel and AMD have gone in a different direction from a system architecture standpoint instruction sets like AVX 512 and AES ni can help with some of these encryption and and you know those kinds of workloads but it's still not as efficient or as elegant as qat if you're rocking pfSense for example qat is dramatically faster than asni oh and I mentioned proxmox in addition to this basically being ready for plug-and play like we just set up Linux and it's like the qat devices were there it had the firmware with SciFi Rapids and emerald Rapids you can do Sr iov on your qat devices but I'm going to have to save that for the next video because I'm already running a little long if you want to jump ahead that right up on the level one form we'll get you started for that basically all you got to do is enable I ommu turn on SR iov set some kernel parameters and then the kernel will actually populate the VF PCI uh driver for those qat devices which you can then pass through to a virtual machine and that can run Docker and run your qat machines and all that kind of stuff and the hardware will handle scheduling all the different qat jobs on your eight physical qat accelerators at least in the case of the Zeon platinum or one or two qat accelerators that you might have in your uh your other platform but that's that's going to be a video for another day but engagement challenge what accelerators do you have access to are you using them uh tell me a little bit more help me sort of shape the next couple of videos in this series because I've got some kind of high-end Hardware here and I've been taking it for a spin and I've been surprised in some ways and in other ways it's like ah I sort of get what Intel is doing with their overall strategy it's it's a bold move cotton let's see if it works out Forum I'm W this level one I'm signing out you can find me in the level one [Music] forums
Info
Channel: Level1Techs
Views: 27,104
Rating: undefined out of 5
Keywords: technology, science, design, ux, computers, hardware, software, programming, level1, l1, level one
Id: bCoED2XN1Zo
Channel Id: undefined
Length: 18min 56sec (1136 seconds)
Published: Fri Feb 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.