The CPU She Told You Not to Worry About (Gaming + Windows on Arm!)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

The Ampere Altra dev platforms are so cool. I'm scheduled to get my hands on one soon and can't wait.

👍︎︎ 1 👤︎︎ u/PurpleGDev 📅︎︎ Jul 03 2023 🗫︎ replies
Captions
this is adlink's ampere alter developer workstation in the model I have used to be the fastest arm desktop in the world a couple weeks ago this has a 96 core arm CPU but now they sell 128 core version and apple just released the M2 Ultra Mac Pro so this isn't the fastest in the world anymore but it's close and I actually doubled my performance from last video but that video ended with a couple Cliffhangers Windows install failures and graphics card support can this thing harness the power of this massive 4090 let's start with Windows and yes it works now a firmware upgrade was all it took to get Windows 11 Pro installed and this isn't some janky hacked together version this is the standard windows on arm installer you download from Microsoft directly well sorta I'll get to that but I installed Windows ran cinebench installed a driver for the built-in GPU and I know what some people are already asking can it run crisis of course it runs crisis it runs Steam just fine and I got crisis and a few other games installed and crisis runs just it runs a bit slow well maybe that's an understatement runs is being generous here but the fact this non-x86 Hardware even loads it at all playable is a pretty significant achievement and I tried out other stuff too I downloaded Adobe Creative Cloud signed in installed Photoshop and did some of my normal raw editing workflow and it was actually fast fast enough I didn't feel hampered by it I tried running Puget bench but this is where you start seeing the cracks in Microsoft's approach to arm Creative Cloud which admittedly isn't what I'd call good software seems to have bugs with managing plugins under windows on arm Microsoft's translation layer isn't perfect unlike Apple's Rosetta 2 it's a bit slower and has no Hardware assist and can't achieve the same compatibility Apple does on m-series CPUs and drivers well drivers for arm on Linux are in a pretty good place we'll get to that shortly but drivers for arm on Windows that's that's a whole different story let's start with installing Windows shows at this point there are a dozen or so arm Windows desktops heck Microsoft even makes one the windows dev kit 2023 their project Volterra that I already covered there's also the surface and other windows arm laptops so why doesn't Microsoft have an easily downloadable arm image for Windows I don't know I had to go to uupdump.net download an arm64 Windows 11 Insider preview build use a Windows PC to flash that to a USB drive using Rufus and then I had my installer ready ampere has their own guide for this process because you can't find arm64 windows at all on Microsoft's website the Windows installation media tool doesn't support arm and the windows 11 download page only shows x64 isos anyway I created an installer and plugged it in last video it would just crash during boot but since then add link released version 10 of the firmware with the uafi bug fix and now windows can install just like any other PC and just like other PCS I still ran into the idiotic Microsoft account requirement seriously who in their right mind would be this pushy about requiring a cloud account just to log into your local computer yeah you can get around it by unplugging your network before installing then pressing shift F10 then typing a bunch of gibberish in the console then clicking all kinds of dumb prompts that make you feel like an idiot and yes I know Rufus can also bypass an egg but that's not the point it's still dumb sorry about that let's turn the rant mode off but now we have a Windows 11 install running native on this thing neat except the graphics are a bit wonky I fixed that by installing this a-speed driver and now I can get HD resolution through the tiny chip on the motherboard I should note the driver is pre-release a speed is working on getting it certified so hopefully it'll be public soon this little a speed chip isn't going to burn Rubber in the latest racing game but it can render windows at 60 hertz just fine at this point I wanted to start testing a bunch of things all the things and there were a lot of highlights like Firefox it has a completely native arm build for Windows so it's blazing fast on this machine and unlike tiny singleboard computers this thing could play 4K content on YouTube all day with no dropped frames but how about some harder things well well one problem that's not specific to this Hardware is there are very few things optimized for Windows on arm it's a chicken and egg problem but even something that's not well optimized can be passable if you just throw 100 CPU cores at it like cinebench this Benchmark is actually one of the worst scenarios because it's optimized completely for x86 not arm a lot of routines just aren't even in the arm architecture but even so with all 96 cores going full blast this computer still chews through renders of course the end result is barely faster than some older desktop chips but that's running under emulation but another thing I found is while I was running this test I ran into a strange issue you'll only see with high-end workstation or server CPUs like this one Windows traditionally doesn't know what to do with more than like 32 cores in fact until recently in Windows 10 you had to buy a special Enterprise Windows Edition just to address more than 60 cores on a CPU but that assumption in Windows means a lot of apps don't scale Beyond a few dozen cores like by default cinebench only used 60 cores I had to go into the BIOS and change the CPU core and C mode from monolithic to quadrant to get all 96 cores in use I won't get into the details of what all that means mostly because I'm not an expert on CPU architecture but the bottom line is there are trade-offs like in the default monolithic mode which works great with all the Linux tests I ran Windows only ever shows apps 60 CPU cores at a time and strangely I could sometimes get higher scores on monolithic only hitting 60 cores than with quadrant running on all 96 why is that well this is partly speculation but the different modes make CPU caches work a little differently with so many little CPU cores the processor's built-in L2 and L3 caches are routed differently and with Hyper Focus software like cinebench my best guess is some routines that are optimized for x86 kind of blow up when they get to arm processors so the problem could actually get worse if you go Full Steam on all the cores and that's a good transition to Steam on Windows steam must have updated recently because the first time I was testing this in April I actually had some UI issues but those are gone now it loads and lets me install and Run games just fine but not all games work in fact most of the older games I tried had issues like Star Wars Pod racer tried installing a 32-bit DirectX version but then it would just die and not startup Quake 3 Arena would open but then the console would pop up warning opengl couldn't load some of this might be down to running on arm and other things because armor Graphics drivers and windows barely exist but in either case there are a lot of apps and games that won't work yet or might even never work but one app that should work but I've had trouble with is geekbench geekbench 6 actually gave a result but while I was monitoring CPU cores it seemed like most of the multi-core tests would only hit 60 cores at a time I could even see that just by power usage during geekbench runs I never saw more than about 170 watts of power use even though during Linux benchmarking it would get up past 220 Watts on the Linux side all 96 cores worked so it might just be like Windows libraries that aren't expecting so many cores again not sure I actually opened a support request with geekbench to try to figure out the problem jumping over from games to General 3D Graphics the heaven Benchmark won't run at all because DirectX 11 couldn't find a GPU which is Fair since the little a speed doesn't do 3D Graphics at all but anything that relies on 3D rendering isn't going to work until graphics cards are supported for Windows on arm and I don't see any timeline for that yet I think Qualcomm likes it that way because right now the only way to really get 3D acceleration on Windows on arm is with their chips anyway I spent a good deal of time day to day in Windows and for General productivity like web browsing and even photo editing in Photoshop this thing could be my daily driver no problem I just hope Microsoft keeps investing in Windows on arm and convinces the rest of the world it actually matters there are a lot of device drivers that just won't work in Windows like the basic Intel gigabit Ethernet driver I had to use this external USB adapter because that's actually supported for Windows on arm laptops and tablets but doing a 180 I'm going to switch back to Linux because that's where this machine really shines first off I didn't mention but early in my testing I upgraded the system from four sticks of 16 gig ECC Ram to six sticks meaning I went from 64 to 96 gigs of RAM you wouldn't think it but that actually made a huge difference in performance I got about 400 gigaflops with 64 gigs and 600 with 96 and geekbench went from 30 000 to almost 36 000. power usage jumped up a bit too I actually did a double take and had to test out both my kilowatt and the sonoff S31 adapter but both were within a lot of each other with four sticks Ram I was seeing about 200 watts of power draw under load putting in two more sticks and changing nothing else the system used 235 Watts that's a 35 watt difference a stick of DDR ECC Ram uses maybe five or six Watts so two should just add 10 or 15 watts Max where's the other 15 watts coming from well I'm leaving out some of the details here but basically this massive CPU has eight memory channels and the com HPC carrier where the CPU sits has six memory slots each memory slot goes to a single channel the more channels you fill the more faster the CPU can access memory and what I'm guessing is happening is the CPU is activating more memory channels thus consuming more power but with great power comes a lot more performance in this case and in this case I could even take out more gigaflops in my case 985 of them by using an ampere optimized math Library instead of compiling a more generic one I followed these directions to install a special library to use with linpack and doing that I could get 985 gigaflops use using 270 watts of wall power that means performance efficiency for this machine went from 2 to 3.64 gigaflops per watt which is nearly as good as my Apple M1 Max not bad but we've been talking a lot about the CPU I left off the last video just hinting at GPU support Windows on arm has pretty much no support but in Linux support is already good and getting better over time I tested my 3080 tie extensively under Ubuntu and I can confirm a 4090 just barely fits in the system but because the power supply isn't quite a thousand Watts I didn't want to risk overloading it so 3080 it is just like for installing windows ampere has a hole guide for how to install nvidia's graphics card drivers on Linux and it worked well for me I didn't even have to recompile the Linux kernel I just installed the arm drivers from nvidia's website and with a graphics card speeding things up now I can get over a hundred frames per second in super tux cart on Mac settings I also ran GL Mark II and I could get a score just under ten thousand then I followed ampere's instructions to install Doom 3 or for well Doom 3 the open source version and running that I was getting a perfect 60 FPS the whole time I could probably get a lot more but for some reason it was locked to the monitor refresh rate and when I tried running the built-in Benchmark it seg faulted and crashed so then I installed open arena which is a little older but based on a similar older engine on a little overclocked Raspberry Pi 4 I could get 90 FPS on this machine it maxed out the engine at a thousand FPS but it's more stable locked at a lower frame rate something like a more reasonable 500 FPS this is a silly example but it just shows how modern gpus can run even older games just fine as long as you have driver support oh windows but speaking of Windows steam ran just fine there over and Ubuntu I couldn't get it to launch at all I got this exact format error which means Linux tried launching steam but kinda blew up because steam is only compiled for x86 on Linux and since Linux doesn't have an x86 translation layer like Rosetta 2 or Microsoft's wow 64 engine steam won't launch at all and I tried launching the heaven Benchmark II but the same thing happened there I won't hold my breath for valve or other 3D Graphics companies to support arm natively but that doesn't mean a fast GPU isn't already useful for armored machines like this one machine learning apps like stable diffusion and Lama run just fine here although the first time I try to completely screwed up my Cuda installed to the point I had to reinstall Ubuntu just to get my drivers working again there's a lot to look forward to here though Microsoft showed off the unity engine running natively on Windows on arm recently and arm showed off a bunch of game related developments at Mobile World Congress 23. and what about AMD gpus it looks like they'll actually work too but right now it does require a kernel patch since the AMD drivers still run into Cash coherency issues on arm that's actually one of the problems I ran into on the Raspberry Pi as well though the amperes PSIs express bus all 192 Lanes of pcie Gen 4 is a lot more robust than the single Gen 2 Lane on the pie so where do we go from here well obviously we go faster subscribe because I'm going to do a live stream where I upgrade this beast from 96 cores all the way to 128 and we'll rerun the top 500 Benchmark and see if we can break a teraflop until next time I'm Jeff kearling
Info
Channel: Jeff Geerling
Views: 125,494
Rating: undefined out of 5
Keywords: windows, ampere, arm, altra, max, m128-28, m96-28, cores, processor, cpu, test, benchmark, crysis, doom, dhewm, openarena, performance, gaming, fast, speed, cinebench, r23, geekbench, compatibility, wow64, gpu, nvidia, 3080 ti, 4090, adlink, developer, workstation, dev, kit, quadrant, monolithic, power, ram, efficiency, supercomputer, top500, testing, photoshop, adobe, exec format error, linux, ubuntu
Id: ydGdHjIncbk
Channel Id: undefined
Length: 13min 51sec (831 seconds)
Published: Mon Jul 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.