NUMA NUMA make Raspberry Pi go ZOOMA

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
there's something off about the Raspberry Pi 5 as I've been testing some of its competitors I noticed something do you see it you might have to look really closely well not actually all these Rock chipboards have the same a76 CPU cores and their multicore multiplier is just about four times faster than single core I mean you expect if one core gets like 800 then four should get four times more right but the Raspberry Pi 5 its chip is only about two times faster using all four cores that's weird and granted the overall architecture is different between these chips and The Rock Chip boards do have four more cores but those are slower a55 course that's not enough to explain just how bad the pi five performs compared to how it should perform ideally and I think the first clue about what's going on can be explained just by looking at the boards look at the pi5 specifically the memory chip now look at every single one of these Rock Chip boards see the difference all the rock chip boards have two memory chips having multiple physical chips can make memory access faster and RAM is one of the biggest performance bottlenecks nowadays but that doesn't explain everything the lpddr4x memory on the pi5 can handle over 4,000 megat transfers per second and the pi 4 was also gimped a bit only being about 2.3 times faster on multicore the bigger problem I think is the memory controller on the chip that powers the pi from all my testing it seems like it leaves some performance on the table but I'm not sure why what really brought this into Focus though was a bunch of recent news reports Raspberry Pi 5 patch boosts performance up to 18% kernel tweaks improve Raspberry Pi performance and on and on 18% faster with a 100 line patch I mean it's possible Numa is something I've mostly encountered on my 128 core ampere server hardware and in the bios for that thing you can choose how the CPU cores are laid out for memory access you can use one giant 128 core chip set it to emulate 264 core hemispheres or in 43 two core quadrants that can make a huge difference for certain types of programs how the cores address the RAM on the system so at first glance Numa emulation could help I mean four cores on the pi5 is 32 times fewer cores than the server but who knows my test results don't back up the 18% claims but they do show an improvement I have a couple theories on what happened but I'll get to those later and I'll talk about another big performance gain you can get without recompiling the Linux kernel more on that in a bit but first the Numa emulation p patch an engineer at aalia posted this patch to the Linux kernel mailing list he writes it can bring a significant performance uplift on Raspberry pi5 specifically geekbench six scores can increase up to 18% I couldn't replicate that but I did see a 12% speed up for geekbench and a 15% speed up for linpack so it's definitely got something to it this patch isn't for the Casual pie user though right now to apply it you have to pull it out of the kernel mailing list and rebuild your pie kernel enabling Numa emulation supp support then you have to set this setting in your kernel command line file reboot and run any software you want to speed up with this Numa control command now that's a lot but the idea is if this could be put into the Linux kernel or adopted in Raspberry Pi's own Linux Fork you could see a 5 to 10% speed up across the board I ran my tests with geekbench and my top 500 hpl Benchmark and both showed consistent improvements not only that power efficiency was a little better too I could get 2.76 gflops per watt with Numa emulation and only about 2.5 without that means this patch makes things faster and makes the pi5 more power efficient at the same time what's not to like well the Linux kernel maintainer reviewing the patch mentioned a lack of documentation for one and said the way this is architected it might make more sense to do it in the firmware or bootloader instead of in the Linux kernel however a gallia's engineer pointed out that this expands existing functionality on x86 where you can already fake Numa for CPUs to experiment with different memory layouts so will this get merged in maybe someday Raspberry Pi could take it on in their own fork in the meantime but I haven't seen any murmurs around that yet another question I've seen is whether this code could speed up Rock Chip boards too after all they have their own weird a76 plus a55 big little architecture that might do better with Numa emulation well the biggest challenge there is Rockchip doesn't publish updated kernel sources I could use to test it it's a little messy on that side and I'm not about to spend a few days just trying to hack older Linux source code but getting back on the topic of Kernel patches to improve performance what if there was a way to suddenly save 50 megawatt of power every year just by implementing a sleep state for older raspberry pies well assuming all 25 million pi 1es and Pi 3s in the wild are actually running 24/7 which is a bit of a stretch that's exactly what this kernel patch would do the Raspberry Pi doesn't use much power when it's running but it could use even less if it implemented sleep States basically when a pie is Idle it's still burning up energy waiting for the next bit of work to happen you can turn off certain parts of a chip when it's not doing anything and save some power or you could even put a whole system in a sleep mode but that's not quite what this patch does the patch is still under review but for the original Pi model B Pi 3 A+ and Pi 3 B+ it adds on suspend to idle support there's some issue with the USB chip not being able to power down but other than that it does shave off 22% of idle power draw on those pies and it' be nice to see if we can get that kind of power savings at idle on a pi 4 or pi 5 too these things all might happen someday but what about something you can benefit from today well 5 years ago I wrote about how the latest and greatest A2 micro SD cards had really terrible support and could actually be slower than A1 cards like this SanDisk Extreme A2 card only gets a little over a th000 iops random WR and 3,000 random read That's slower than the minimum 2,000 and 4,000 in the A2 class specs but as I wrote in my blog post the A2 speeds reli on command queuing and RAM caching to work well Raspberry Pi just implemented support for command queuing and just like dless NVM ssds can go faster using the systems Ram now A2 micro SD cards can go faster on the pi 5 using command queuing how fast well two to three times faster which is pretty significant for some things like launching apps or upgrading your Pi these numbers were from Raspberry Pi's Diagnostics tool but when I run my own benchmarking I noticed it doesn't make any difference at all for sequential access but for all the random IO gains there's a catch get it a catch for the cash micro SD cards are already finicky when it comes to reliability some things like pulling out the boot drive from a booted P are dumb of course but right now your P would usually survive that one Pi engineer in this thread mentioned this feature isn't enabled by default yet because of how it handles surprise card removals the command queing doesn't handle that well and you can end up with more corrupt data if you want to test this out on your pi5 you have to have an A2 class micro SD card then you can add the SD cqe parameter in your config.txt file and reboot I'm not sure if or when this will become a default option but if it does that'll be nice for people running the latest and greatest micro SD cards for my own needs I'm leaning more on m.2 nvme ssds now and I posted a video going over options for that earlier this year now thinking about why my testing only showed a 12% Improvement while a Gallia test showed 18% my main theory is that maybe the engineer running the test was running an older Pie release for the Baseline last year some people discovered the 4 Gig Raspberry Pi 5 was actually performing better than the 8 gig Pi 5 and After figuring out it had to do with the SD ram refresh rates Pi Engineers pushed out a fix for it in April this year if the agalia engineer was comparing against older numbers maybe that was tainting the results a bit I know my first geekbench 6 run from way back in September last year was only 1507 multicore if I take my score with Numa emulation enabled that's actually a 25% increase so I'm not quite sure what happen but this is one of the reasons why benchmarking is both a science and an art you can just compare numbers but context and the numbers you choose are both equally important one thing that's a little more of an art however is pushing the Pi's clock Beyond its limits and in a future video I'm going to see how far I can push an overvolted pi5 you didn't know it could be overvolted well I didn't either until I saw this post from jron subscribe if you want to see me hack away at this pi5 even more and until next time I'm Jeff geerling I noticed something the HVAC is on that's what I noticed granted the over can be explained by just but this patch isn't for the Casual you could see a 5 to 10% two to three times faster which is pregnant but raspberry K Raspberry Pi could also take it in on their own take it onm
Info
Channel: Jeff Geerling
Views: 133,164
Rating: undefined out of 5
Keywords: raspberry pi, linux, kernel, recompile, numa, emulation, fake, x86, code, patch, lkml, lore, rockchip, rk3588, rk3588s, orange pi, pi 5, radxa, rock 5, model b, cm5, bcm2712, performance, efficiency, geekbench, hpl, linpack, high, top500, benchmark, benchmarking, a2, microsd, command queueing, cqe, enable, cache, caching, nvme, storage, gigaflops, optimize, igalia, engineer, rpi
Id: 2ZrVyFCkOew
Channel Id: undefined
Length: 9min 1sec (541 seconds)
Published: Fri Jul 12 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.