Raspberry Pi demolished by monster 128-core ARM CPU!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
there are six Raspberry Pi computers in this little box each of these has four CPU cores and eight gigs of RAM so in total there's 24 cores and 48 gigs of ram in Full Tilt this thing only uses 40 watts of power six pies should beat one CPU right and it ain't cheap this whole cluster including six two terabyte nvme boot ssds on the bottom is 2200 bucks if you want a 24 core PC you're looking at a 3960x admittedly that's just the CPU but that's 200 bucks less what a savings but even if Ram a motherboard and all that other stuff doubled the price which it doesn't that threadripper will absolutely destroy the pi cluster's performance but that's comparing apples to oranges what if we could compare apples to apples well for that I had to take a little field trip from sth and today we're going to talk about an arm server oh wait it's Jeffy G hey Patrick hey Jeff how you doing I see that you have this awesome looking arm server behind yeah this is an ampere Ultra Max server it's awesome all right can can I come in and we can talk more about it yeah totally let's do it all right can so can you tell me about this server yeah sure okay so this server is actually based on an ampere ultramax m128 30. and that m12830 means that we have 128 arm cores it has a total of three gigahertz of clock speed they of course can alter both of those but this is basically the fastest arm CPU that you can buy today so each of these little compute modules has four cores at 1.5 gigahertz and my cluster has six so this thing is at least 10 times faster hopefully I I noticed there's some other like look at all the ram there this thing has eight gigs of RAM yeah so this system has a total of 512 gigs in it and there are eight memory channels ddr4 3200 you can put up to two dims per channel so you get a total of 16 dimms next to the CPU and and there's a lot of PCI expansion I see here too right so this has pcie Gen 4 and this is really designed by the way to be a competitor to things like these Intel Xeon the AMD epics and so you have tons of PCA Gen 4 lanes and you can put things like even gpus which we're going to have pretty soon and there's can I can I take one of these out 100 you can so 3.84 terabyte nvme SSD I think that that kind of Beats the pie a little bit with its one lane of PCI Express gen 2. yeah a little bit a little bit and then there's dual redundant power supplies there are 80 plus titanium power supplies and then we have a bunch of fans back here and really this is a form factor that's designed for the edge so we've taken the risers out but this is really designed for the edge so it's shorter depth and this is the type of thing that if you wanted to go put gpus for AI inferencing remote desktops you want to go put fpgas for AI inferencing or networking so if you have 5G base station or you want to go just put uh you know high-end networking or something like that you can totally go do that in the system yeah and speaking of that built-in I see there's a couple cages for SFP what whatever so those are SFP 28 so 25 gig ethernet which is built in and then there's also BMC for yep so this system has a BMC for out-of-band management and actually there's a second arm processor here which is the a-speed AST 25 or 2600 so that's actually another arm processor that runs openvmc in this system so if you put some dpus in here too you'd have lots of oh yeah you could have many it would be like conception basically and then under here there was another thing too for networking right yeah there's an ocp Nick 3.0 slot here as well so there's lots of lots of different ways that you can configure this system with risers and different network cards and what have you just as a point of size comparison here's the uh the pi cluster with six nodes and uh it's a little bit small you could probably fit fit a few of these inside of this box I think one thing that would be really interesting is to see how these two compare in speed with a cluster Benchmark and that's what I'm going to do but you know another thing that's interesting is you can't buy either of these systems from Micro Center but you can buy just about anything else and you know what they're the sponsor of this video I visit the St Louis Micro Center all the time to pick up a hard drive a power supply or even 3D printer filament they have everything well except they don't sell the Ampere Ultra Max at least not yet but they do sell super micro servers and they even have a maker section that takes me back to my youth when I pick up parts an electronics store and hack together some cool device Micro Center has locations all over the us but if they're not yet in your town visit them online at microcenter.com they have an exclusive deal for new customers 25 bucks off any AMD or Intel CPU use the link in my description and get started on your next build today so who would be deploying these things yes so the Ampere Ultra and ultramax are really used a lot of times by the cloud providers so that would be folks like Oracle I think was the first one to deploy it but then you also have Microsoft Azure you have Google cloud this is really the response chip to something like the AWS graviton series and then folks are looking at well how do you take the processors that are currently in the cloud and then also deploy them at the edge for things like 5G and all that kind of stuff and that's really what this server is for here this is more of an edge platform designed to have that continuity between clouds and Edge so one of the things that I want to see is you mentioned Edge there and I know there is one area where sometimes somebody would deploy a pi cluster because a pi runs on five to ten watts of power this thing 100 Watts 200 Watts right idle in the 120 130 watt range so so a little bit more so I you know Edge can mean a lot of different things and sometimes it's a deployment where there's just solar power or something like that other times you have grid power with good backup and all that so I think that those are things people consider when they have a power budget a price budget but what I really want to see just out of curiosity is how much faster this is it seems on specs that it should be 10 times faster but I want to see how much faster it is than my Pi cluster so I'm going to test the pi cluster first and then we'll run this thing one thing I like to do on my Pi clusters is run high performance Lin pack that's the Benchmark that's used for the top 500 supercomputer list and it's super Computing 21 I had a little competition with Patrick we wanted to see who could build the best arm cluster in a box I built this Turnpike 2 cluster and it clocked in around 51 gigaflops good enough to make it on the super computer list at least in the year 2000 but Patrick well he showed off an impressive server with a bunch of expensive Nvidia dpus with 56 arm CPU cores but he didn't run Lin pack on it so I'm going to call that one a tie so at home I rebuilt my Pi cluster using this kit called the super 6C it comes with almost everything you need to build a super computer in a box except for the res spray pies hopefully they'll be more available soon I mean even Upton says they will be I think in one year hopefully Raspberry Pi will have uh recovered from the lingering effects of the covert 19. I built a self storage cluster first and for that video I built a network storage cluster with six kyocsia xg6 ssds but kyocsio reached out and told me those things are old-fashioned now so they sent me six of these These are the latest xg8 model and Patrick has a great review of them on the serve the home website link below now the number one reason people don't think Raspberry Pi's are reliable is because they use micro SD cards and for that Self Storage cluster actually was booting them off MicroSD but not anymore the compute module 4 boots directly off nvme ssds so I tossed out all the micro SD cards this time the Super 6 C board has an m.2 slot wired directly to Each pie but to make sure I could actually boot off of them I had to update the compute modules firmware then in Flash raspberry pios to each SSD to update the firmware I popped each of my compute modules into this pytrain mini plugged it into my computer then ran the USB boot software from Raspberry Pi I could actually update the firmware directly through the super 6C it's nice that it has these little USB ports one for each Pi but I just like the convenience of my little pie tray in USB boots config I set up the boot order so that last digit the sixth there would force the Raspberry Pi to try nvme boot first I have a blog post covering the process in detail and I'll link to it below so I flashed all six pies and made sure they did the happy little LED blink thing at the end then I moved on to flashing the ssds for each SSD I plugged it into this nvme to USB adapter and plugged that into my Mac I flashed each one with raspberry pios and set a hostname for each one like Node 1 node 2 and so on I also made sure to Mark which was which and put one SSD under each pie so I could keep track of Which slot they go into I mean I could have done better than the Post-it note but that was the quickest way to keep track of everything now it's time to load up the super 6C and I'm really happy desk pie listened to my feedback from my earlier video they sent me this updated board after I mentioned the USB header had the wrong pin out and some of the ports were a little off this new revision is much better and fits perfectly in the official super 6C ITX case the case comes with three cooling fans front panel USB a power and reset button and it even has some screw holes on the sides that look like they'd be perfect for rack mount ears but so far they're not selling any hopefully that's a thing because you can unscrew the rubber feet and this thing would be great in one U of Rackspace the case also comes with six giant heat sinks and all the screws and thermal pads to get it all installed so enjoy this last bit of assembly and yes that's the LTT screwdriver making an appearance again I'm actually surprised how many times its default bit set has come in handy especially that 2.5 millimeter hex bit I'm going to do a follow-up video on the LTT screwdriver and now that I've used it a few months so if you're into that sort of thing subscribe one thoughtful Edition desk pie made for this case is this access panel on the bottom so you can easily swap out micro SD cards and nvme drives it beats having to take out the whole board just to get to the underside once I booted the cluster the first time I noticed the fans are a little bit loud I was getting over 50 decibels from 6 inches away but they kept the pies from throttling at the base clock speed at least the ports all line up perfectly now and I like the fact that the case even includes Wi-Fi antenna mounting holes if you need Wireless for more opinions on the super 6C board itself go check out my older video with everything set up I wrote an ansible Playbook to manage the cluster and to load in high performance Lin pack the entire setup is open source so if you want to replicate my results go check it out on GitHub you can even run it on your PC or Mac as long as it can run Linux The Playbook compiles tools like MPI and Atlas from source so they can get the best possible performance then it runs the Benchmark and spits out the result and at the base 1.5 gigahertz clock there was no throttling at all even while all six pies were spiking their cores to a hundred percent after tuning the hpl settings a little I was able to get 60 gigaflops and the cluster used about 41 watts of power while it was running The Benchmark I wanted to see how much faster it would be overclocked so I set all the pies to boost to 2 gigahertz and ran it again this time power usage was all over the board spiking up to 64 watts and The Benchmark was taking forever I looked at the temps and saw the pies were actually throttling apparently those three little fans aren't quite enough for a cluster of overclocked Pies running Full Tilt so I popped the lid and decided to go for my biggest fan I pulled out my giant USB noctua fan turned it all the way up and put it on top of the cluster the temperatures were much better and power was steady this time about 51 Watts when The Benchmark finished the overclocked pies mustered a little over 70 gigaflops so looking at the two results I had 60 gigaflops using 40 watts and 70 at 51 Watts if I divide gigaflops over what I get it in the efficiency of 1.5 gigaflops per watt at base clock and 1.4 overclocked so that takes care of my cluster you got this thing back you ran a bunch of benchmarks you had a little Adventure doing that too but uh can you tell me what kind of performance we got yeah so we ran the exact same script on this system as was run on the pi cluster and we are using Atlas tuning and stuff like that to get the best performance we can and we got around a total of one teraflops for about a 500 ishwat power budget so that's that's about two about two gigaflops per watt yeah something like that more efficient than the pi cluster it is yeah a side note my uh M1 Max Max studio got four gigaflaps for what so there you go a little more efficient there good job Apple newer tsmc process as well I think is that true yes yeah can you also tell me a little bit about price too because this Pi cluster I mentioned was about 2200 bucks total with all this you don't need all the ssds that I have in it you know what kind of pricing are we talking about yeah so this is definitely more than like a five thousand dollar system in most cases you're probably gonna be spending probably closer to 10 on uh you know a system that would be similar to this and how you'd actually configure it to go and deploy you can of course spend we have one that we're gonna be reviewing pretty soon that I showed you that we're going to be uh that review is coming up but that will definitely be a much more expensive system subscribe to serve the home and you're going to want to see that server yeah uh mini gpus but I think on a price performance perspective like this is actually pretty darn good compared to a Raspberry Pi cluster it's something I mean what 13 times more just on that specific benchmark we're seeing 13 times more performance for maybe two to four times the price depending on the configuration right and you still have things like more networking more expandability we can put more storage in it I mean there are a lot of things that you get Beyond just the compute side that you know I think you have to take into account as well especially when we get to these larger nodes like the fact that this has uh one gigabit networking and you're not gonna be able to get too many bits and bytes out of this thing so we ran linpack and that's you know the reason I did that is because top 500 list it's kind of cool to know how your computer ranks and there historically you know you guys do a lot more in-depth benchmarking for a server like this than I do usually for the pies I'm usually just doing the the basics talk to me a little bit about benchmarking methodology why Lin pack may or may not be a helpful Benchmark to know yeah so I'll give you a couple things so one on a big chip like this typically what you do and you do this on by the way the Intel xeons and AMD epic generals and all that kind of stuff you actually would split this chip up into usually like four different quadrants to be like NPS equals four and and what you do when you do that is you actually kind of localize the memory access and kind of localize the data transfer on the Chip And that gives you a little bit better performance there are other optimizations you can use in terms of math libraries and stuff like that so we're going to say this is about one teraflops using a similar type of you know build system and scripts that we were using here but there are some optimizations that when you see things like the actual top 500 numbers those are definitely sometimes you use custom compilers to use all kinds of crazy bio settings all that kind of stuff to get better performance than we you know would have in something like this so this thing you know the price performance is pretty good it's obviously better than the the pi cluster here if you need the performance you know if you don't need it and you're deploying The Edge and you have low power budget that's great but if we're talking about data centers where these things often would end up I know that Genoa just launched how does this compare to a new a platform like Genoa or Intel or anything like that all right so just kind of taking a look at the chips this is the Ampere ultramax is 128 arm cores this is the Genoa 9654 which is 96 cores and 192 threads as a quick size comparison this is the Raspberry Pi yeah that kind of gives you some just kind of sense of how big these things are and these are only the processors by the way these do not have things like uh have all the memory and all that kind of stuff on them so the key thing here though is that you do get a lot more in terms of performance on i o so you get more memory bandwidth you get things like more pcie Lanes faster PCA Lanes is PCA Gen 5 this is PCA Gen 4 and then the other side to it is that when we look at things like limpac you know a chip like this will probably do over five teraflops uh you know and and that's in a a little bit higher of a TDP range but on the other hand when you actually look at the total system power it's probably you know for a dual socket server with these it's maybe an extra I don't know 10 20 performance when you're running Lin pack versus running like two of these systems and you know you're getting way way more performance you're getting like 10x of performance right so I would personally take 10 to 20 more power consumption for more performance so people say by the way that arm is always more power efficient than uh x86 that's that's not always the case because this is optimized for not just things like serving web pages but also things like these HPC applications whereas the Ampere ultramax is really designed more for things like doing nginx web servers or stuff like that so this is really focused on Integer performance this is integer and floating Point performance and so we're talking about you know two gigaflops for what maybe maybe a little more than that for these CPUs but when I talked to you about this before I started this whole project you mentioned that gpus are well beyond that yeah and so you know if we look at things like a CPU like this you might get something like I don't know 10 or so uh gig flops per watt and then you know if you look at a GPU you might see something like I don't know 40 or depending on the GPU and stuff so so just kind of like like when you go up the scale right you might get like one to four for a desktop CPU for a high-end server CPU maybe you're going to get 10 and then we get to gpus you start getting to four so one of the big things that you see in the industry is using these ampere Ultra Max processor with high-end gpus to augment the floating Point performance and we actually did a video on that with the with the gigabyte system and we did the a100s plus this ultramax yeah so your workload has to support that though so not everybody's workload can just go over to gpus right and not everybody wants to go put 10 plus thousand dollar gpus into a server just to save a couple bucks on a CPU so you mentioned that when you got the top 500 Benchmark result when we ran Lin pack you actually had a little bit of an adventure getting it running and that's something that I see it as a theme you know the Raspberry Pi this compute module has a processor that's well supported by the Raspberry Pi group themselves so when I want to get an OS I usually use raspberry pios but I found that with a lot of other sbcs that use Arm CPUs often you have to rely on an old Linux kernel or support for something and it's not documented that well and it's kind of all over the board how is it in the server space especially with this chip so I'll definitely tell you that we started doing arm server CPU reviews especially in the data center around the cavem Thunder X stays in like 2016 and back then before especially before Ubuntu 26 or 16.04 like man this it was absolutely the world was terrible for armed servers over the years it's definitely gotten a lot better and at this point we actually just downloaded a went to image installed it on here and it generally worked we have a little bit more of challenges when things happen like we actually had some disruption I don't really know why but just we turned the system off and then on put it in the studio over here and then the the OS image was corrupt so we had to reinstall it and then the opmc didn't want to install it and so we had to go and just put a little USB drive in there and that finally fixed it but just like little things like that you wouldn't typically see and even in like things like the newer uh the Genoa systems or the sapphire rapid systems which are still technically pre-production you definitely can tell that there is some difference between the maturity of an x86 server and an arm server but it's way closer than it used to be these are Deployable at this point so I wouldn't say that like you know they're they're totally funky just science experiments or anything like that but uh there's still a platform difference if you want to go and deploy bare metal so the 128 core ampere server might cost three times as much but it's way faster and more power efficient and that's before we even talk about dual 25 gigabit Ethernet 128 Lanes of PCI Express Gen 4 and all the other nice things super micro stuffed in this box but that leads to the question who is this Pi cluster for well the market for something like this is small especially with pies being so hard to get right now but there are other cm4 compatible computers you could put on here so I might test a few soon but even with faster little arm CPUs the performance won't be anywhere near what you can get with a real Cloud Server but it's still fun for experiments and could be useful in some Edge use cases like if you have limited power I want to thank Patrick from serve the home for letting me come out here and take a look at this new server and especially like sdh's new Mini PC Series where they walk you through some of these awesome little fanless PCS you can use to build a custom router at home so go subscribe to sth both on their YouTube and their website and there are links below until next time I'm Jeff Gerling
Info
Channel: Jeff Geerling
Views: 896,287
Rating: undefined out of 5
Keywords: raspberry pi, arm, ampere, altra, max, m128-30, pi 4, cm4, compute module, super6c, deskpi, cluster, bramble, server, workstation, neoverse, n1, a72, a76, cores, threadripper, performance, benchmark, hpl, linpack, high, xeon, super, supermicro, edge, iot, compute, top500, green500, efficiency, power, usage, datacenter, patrick, servethehome, serve, the, home, austin, texas, collab, dc, ansible, genoa, amd, intel, xg8, micro, center, red shirt jeff, ipc, clustering, interview, sth
Id: UT5UbSJOyog
Channel Id: undefined
Length: 20min 53sec (1253 seconds)
Published: Wed Nov 23 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.