Threadripper 7000 on Linux: The Return of HEDT is Imminent!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thread Ripper is back but is thread Ripper back for Linux workstations that's what we aim to take a look at if you picked up a thread Ripper system you might be rocking a 2,000 series or a 3000 Series the 2990wx famously worked better on Linux than Windows but what about the new thread rer high-end desktop these are not pro CPUs they're meant for high-end desktop they got a hefty price tag but the performance is pretty awesome let's take a deeper look and let's see how it works with [Music] Linux [Music] so first up thanks to AMD for sending these over so that I could test drive them with Linux there's a lot of performance here but there's a couple things that need to be addressed or things you should be aware of if you're going to use this on the Linux platform but uh breathtaking performance is not one of those things it's basically the fastest thing you can get right now period devops yep developer workloads and kernel compiling yep things that need lots of memory bandwidth okay well you're probably going to need threader Pro for that one but still lots and lots of IO Lanes not as many as thread rer Pro but still a ton way more than desktop for our benchmarking we're using the fonics test suite for most of our actual real world benchmarking but I also use this as a workstation using pop OSS as a base but with the latest kernel 6.x series because you really need a lot of those bleeding edge patches for this platform okay you don't have to have them it actually does work fine without them but I kind of want the bleeding edge features because you get just a little bit better performance all the way around now the first Benchmark I call your attention to is the memory Benchmark I'm comparing this to the last gen 32 core thread Ripper Pro just to give you an idea this is a four memory Channel platform and we're testing this in the TRX 50 motherboard from Asus which I've also covered separately in another review on the main level one text channel so if you're more interested in this board checking it out it's pretty cool it's got optional uh remote management ipmi outad management whatever you want to call that uh it's got 10 GB ethernet and 2.5 GB ethernet the a quantia chipset it's got lots of pcie lanes lots of m.2 it doesn't have any MCI by4 or by8 connections I really wish that it did but overall this is a pretty solid board with a server is bios it doesn't have any of the gamer nonsense there's not even a single RGB header on this motherboard but the first and most important thing is the memory bandwidth to understand this I think if you're going to get this workstation you're only going to get it in a four memory Channel configuration you can run this in a dual Channel configuration meaning that you only populate two of the dims but it's one dim per channel four dims four memory channels it's also registered error correcting memory this is something that is important to a lot of you but some of you may not know the difference between error correcting dims and registered error correcting dims there is such a thing as unregistered error correcting dims I think this is sort of the final nail in the coffin of unregistered error correcting dims though because there's nothing not to love about amd's impl impementation of registered ECC memory I could actually go into a whole separate video about how awesome the registered ECC memory is because even though this is ddr5 5200 as the maximum supported speed they still managed to get The NPS 4 it's four Numa noes latency down under 70 NS with registered ECC memory this is no small feat and with a little bit of tuning you could do like 65 nond 63 NS it depends on your kid of memory and some other stuff that's coming but ddr5 6400 is also possible on this platform those higher clocked ddr5 memory speeds and you can get into a situation where you've got some impressively low latencies if you want to stick to jedx standards of course you can run just no bling server memory I was using G skills ddr5 Neo kit so if you want to pick up the same kit there's a For Thread you can basically paint by numbers with this machine and be reasonably assured that basically everything works on Linux out of the box including audio everything except S3 sleep which is a little rough around the edges it does work but it doesn't work perfectly we'll talk more about that in a minute and that's going to get fixed with a bios update don't worry but overall pretty solid platform but you still only have four memory channels to work with so you would expect it to be half the performance of ddr4 thread rer pro pro roughly the same clocks because it's the same same bitwidth but no actually the memory controller is more efficient here there's not really a huge difference in terms of the raw mbw bandwidth between our eight channel threaded for Pro platform running at 3600 which is you know beyond what you would get with the registered air correcting memory variant on last gen thread rer Pro versus ddr5 6400 on the thread Ripper high-end desktop platform the thread Ripper 7000 platform in other words and it turns out that for developer devops type workloads you really you're not really super constrained by memory bandwidth but we are testing the 64 core and the 32 core thread rep per 7000 series Parts 3 2 core I think is going to be a sweeter spot for your Linux workstation unless you're doing rendering lots of background tasks stuff like that 64 core might make sense for you but certain workloads do actually have a lot of ddr5 memory pressure so maybe 48 cores is possibly The Sweet Spot for these devops type workloads but you know 32 cores with the higher clock speeds versus 64 cores at lower clock speeds it just depends on what your jobs are you're going to have to understand your the geometry of your workload to really make a decision here but I digress if we look at benchmarks like rodinia it's like okay yeah we moveed from a 32 core to a 64 core platform even from ddr4 to ddr5 it doesn't make a huge difference for these computational fluid dynamics type problems partly this is an optimization problem and partly this is because thread rer Pro 5000 series was pretty darn good in its own right but we are moving up to pcie Gen 5 and ddr5 for most of our multimedia benchmarks it's pretty much the same story there are some significant core oncore gains here but unless you're running lots of multimedia stuff in background the difference between 32 and 64 core is really not a doubling I got to get my hands on threader Pro to figure out if this is memory Bandit related or just gen on gen or just that the optimizations really nailed it for the uh thread over Pro 5000 series timed Linux compilation on the other hand shows a very significant speed up and that's not something that historically scales well with a ton of cores so 32 and 64 cores on thread Ripper 7,000 series versus our 32 core thread Ripper 5,000 series thir for pro 5000 series eight memory channels versus four with 64 cores it's not exactly an Apples to Apples comparison but over 100 second Savings in the all mod can fig colel build pretty darn good gromax on the other hand uh more cores is more better right more cores more Zen 4 cores is more better nearly a doubling from our older gen 32 core part to our newest generation 64 core part not bad open Veno open Veno is one of those really interesting benchmarks open Veno benefits tremendously from 7000 series Zen 4 cores also experimented a little bit with Zen DNN I don't really 100% know what I'm doing here but running Zen DNN limiting myself to 32 threads on the 64 core platform versus the 32 core thread rer Pro seems that this workload does benefit from the extra memory bandwidth of the thread rer Pro platform because the performance parody is pretty similar meanwhile increasing the batch size to 64 uh yeah we we did double the performance so is it that there's more residency on on the each individual core like everything can live in cash that would seem to suggest that it's not a memory bandwidth issue but then on the other hand I don't know I'm not sure what to make of that result I'm also not 100% sure what to make of our open CV results for image processing for video it's about like I'd expect zen4 is just rawful stomping everything but for image processing and everything else it seems as though it's more IO bound or more Bound for the rest of the parameters of the system it doesn't really matter as much in terms of like per core performance now in terms of gaming you might have noticed that I've got the azck Aqua 7900 XTX in this system and boy howdy let me tell you this system can game if you're a game developer and you're developing on Linux and you want to run Windows VMS I'm I'm still working on the IU testing that's not quite done but proton is what I'm talking about here everything steam related everything native to Linux it is breathtaking gaming performance it is really amazing and again 64 core versus 32 cores 32 cores is really the sweet spot for those kinds of performance but hey you want to play balers gate at a bazillion FPS on the 7 100 x TX natively via proton under Linux you can do it perhaps one of the most remarkable things about this is that popos with uh admittedly some tweaks and updates for Mesa and video drivers for the 700 XTX latest konel etc etc we're basically at performance parody with Windows it's really close shockingly close will 2024 be the year of the Linux desktop it can't get here soon enough probably not but I love that Linux works as well as it does for desktop and productivity tasks at this point it's not just server that Linux has completely taking over the universe oh we're trying on desktop woo let's talk computational fluid dynamics I don't have access to a lot of software to do this I've asked Anis and many others for access to software in order to do this kind of testing because this is right up our alley physical simulation like if you're an inventor and you need to test like how radio waves are going to interact with things or PC board losses or fluid dynamics that we always talk about it's nice to be able to simulate those things and that's basically becoming a commodity which really helps uh lower the number of prototypes that you need to do anything this is used extensively in microprocessor design even just like a newer more efficient washing machine really awesome thanks to the awesome people in the level one taex Forum I've been able to get access to consol and run simulations with consol so running this particular simulation with consolle in our 128 gig thread reper High desktop machine it's it's about an hour and 9 minutes and to put that result in perspective with a 64 core machine okay that's pretty cool but our Zeon comparison system takes over 3 hours to run that's not exactly an Apples to Apples comparison because I don't have a large library of these systems yet I only just got access to console because again awesome level one texts form community members but this is really cool one reason that you might want to go with rer for pro though is that instead of four memory channels you have eight this is a program that not only uses compute but also memory it's a lot easier to build a system that has 512 GB or a terabyte of memory when you have eight memory channels instead of four because you can just use physically more dims that's also faster another reason that you might want to do that is the people that make comma and ansys and other software like this generally tend to certify the thread Ripper Pro platform first AMD has been so uh relentlessly and so consistently executing on their high-end and workstation platforms that now software companies are taking notice and they're going through the motions to say yes an AMD platform will work fine with our software very exciting to see and the results speak for themselves if you are feeling adventurous with this Asus tx50 motherboard you should know that the Gen 4 u.2 connection on the front I try to use it with just regular old Gen 4 cables you probably going to need cables that have a built-in R driver to use it with a Gen 4 SSD I've also got some kokia CM7 drives those are Gen 5 and those worked fabulously well off of pcie uh like eight Lane and four lane mcio connectors on server class motherboards we really need those four and eight Lane MCI connectors on motherboards to be able to take advantage of all of the pcie Gen 5 lanes that are available I mean yeah sure the PCI slots themselves are Gen 5 but Gen 5 storage like the the Kyo CM7 Gen 5 SSD in a workstation like this would not be unheard of and that's really where we should be when we're doing highend desktop versus consumer grade Gen 5 in my opinion but you know let maybe a video for another day I'm wh is level one I'm signing out you can find me in the level one forms if you have any jobs or anything that you want to run on this platform let me know let's take a closer [Music] look
Info
Channel: Level1Linux
Views: 41,927
Rating: undefined out of 5
Keywords: technology, science, design, ux, computers, linux, software, programming, level1, l1, level one, l1Linux, Level1Linux
Id: RxYdCeRH1MA
Channel Id: undefined
Length: 12min 18sec (738 seconds)
Published: Mon Nov 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.