Cheap vs Expensive MacBook Machine Learning | M3 Max

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I've got the cheapest M1 MacBook Air and I'm comparing it with the really expensive M3 Max MacBook Pro today we'll explore their performance starting with a basic machine learning functions like matrix multiplication up to model training and inference but we're not just doing CPU stuff today oh no we'll get into how the GPU will affect performance and also Target that elusive Apple neural engine to see how much faster it'll be on the M3 Max my exploration today is inspired by Timothy Le's blog post and I'll show you his tools his M1 Max MacBook Pro results and add my modest M1 MacBook Air my daily driver M2 Max MacBook Pro and the luxurious M3 Max magbook Pro and though this M3 Max is not made of gold but it might as well be it's expensive I've invested in these machines setting up the environments and running these extensive tests to compare Apple's Tech from the past few years in the power hungry world of AI I mean uh performance hungry maybe par hungry time will tell we're going to kick things off with CPU performance for a tech Enthusiast and software developers a fast CPU means smooth multitasking and Rapid code compilation but for machine learning matrix multiplication is one of the core functions that a CPU needs to perform well Timothy's results for his M1 Max were quite stunning especially for single Precision or fp32 showing two teraflops for large Matrix sizes well hold on to your horses Timothy because I've got something to show you in fact I even had to increase the range of the chart here to accommodate the M3 Max results in summary the M3 Max MacBook Pro dominates in performance across all Precision levels outshining both the M1 and the M2 Max it also surpasses the AMD ryzen 5600x even with mkl enabled that's Intel's math kernel library and that's the machine Timothy used for comparisons now a central player in these calculations is a tool called numpy which you might have heard of it's written in C and and C++ to be really fast but there is a little trick that Apple silicon has up its sleeve and that's hidden away in an AMX co-processor embedded in the CPU Apple lets us tap into the AMX using the accelerate framework and if you compile numpy to use accelerate things get even faster using the accelerate framework significantly enhances the performance of both the M1 and M1 Max in single Precision tasks while offering only moderate Improvement in half and double Precision tasks compared to K's numpy installation which is the basic numpy however matrix multiplication is not the only thing that gets a boost here numpy's other key functions like random number generation special functions statistical functions Vector multiplication SVD whatever that is I don't want it the M3 Max MacBook Pro outperforms the M1 MacBook Air of course in these other npy functions as well showing it's much faster especially in tough tasks like igen decomposition and matrix multiplication which we've already seen and this shows how much better the new M3 Max is compared to the older M1 and actually everything in between as a side note I've created a video on how to compile numpy for accelerate along with some of these tests and that's available exclusively for members of the channel thank you to the members for your support special content for you starts right now with that first video that's already posted and if you're interested in joining and supporting this channel the join button is right below now there should be no surpris with the CPU improvements there what about the GPU and all the additional cores that Apple keeps piling on just as a reminder Apple silicon's GPU cores are built into the same s so or system on chip as the CPU this is not a separate giant power hungry GPU card like the Nvidia RTX 390 for example even so the power consumption of Apple silicon gpus have been going up from the 7 core GPU on the M1 MacBook Air that I have to the 32 cores on the M1 Max to the 38 cores and now now we got 40 cores on the M3 Max which can get up to 53 WTS of power used it's still nothing compared to the 450 Watts that a 4090 card can use though now to test the GPU you can use metal apis directly but we'll use Timothy's scripts in the TF metal experiments repository which use metal via tensor flow making it pretty easy to integrate into existing ml workflows by the way I made my own Forks of Timothy's repositories which I'll link down below you can navigate to wherever you want to there I've added my own tweaks and results there that that's why I made my own Fork so how much more teraflops can we squeeze if we switch to the GPU a lot Timothy's M1 Max got about 8 Tera flops while the M2 Max and the M3 Max got into the low teens our little M1 air doesn't do so well here with just 1.5 Tera flops reached we love you you're a good little machine so how does this translate to machine learning well pretty directly actually the M1 Max shows quite decent performance for training deep learning models and I ran resnet 50 here which is a convolutional neural network is it like an RTX 3090 well it depends what you're comparing the M1 Max understandably has lower throughput overall but it demonstrates impressive efficiency offering comparable performance per watt and my machines were all over the place here with the little M1 managing to Crunch through with just 8 GB of memory even though the resonet 50 training requires 21 GB we're not going to look at the Swap okay maybe just a peak oh God the swap now how about that Apple neural engine or A&E for short or even shorter NE this part of the S so is pretty undocumented after I poked around for a bit we can have a better guess at what Apple wants to do with it after all Apple's very own mlx framework that was just released doesn't seem to even take advantage of the A&E so it makes you wonder to tap into the A&E we need to use Apple's core ml or in the case of python they have a convenient core ml tools Library we can use this to convert models to the correct form format that will take advantage of the A&E let's start with our good friend matrix multiplication now for this test I wanted to observe the different parts of Apple silicon at work here and with core ml you're able to Target different compute units so you can say CPU only CPU and GPU or CPU and Ne so here in the code using the core ml matol script we're converting the model and we're sending in the compute unit let's see what this looks like when we do CPU only just for fun so we can get a comparison here now while it's running I'm using a top which is actually a tool that Timothy brode himself and we've been using it a lot here a lot of other YouTubers have been using it notice how the pcpu usage is at 50% There is almost no GPU usage and no A&E usage at all here only CPU and we've got tops which is Terr operations per second at 0.23 not that great frames per second by the way is 13.7 that means it can process 13.7 images per second not good let's change it to CPU and GPU back to azy toop look look at that GPU usage jumping up almost to 100% CPU is pretty low A&E is zero we just have the efficiency cores of the CPU working most of this work is being done by the GPU there we go that's a much better number right there 631 frames per second what about that A&E here it's called CPU and Ne let's run that one now keep an eye on this A&E right here look at that wo 170% holy cow that went off the scale wow okay so this GPU is not being used and the P cores and the E cores are barely being used well P cores are the ones we care about but the A&E is being used 9.5 watts of power it's sipping that power compared to the GPU and compared to the CPU ter operations per second 12.5 frames per second 727.5rpm ma here I was surprised to find that the result is actually less than on the M2 Max and I did run this a couple times I was expecting more but got less not sure why now it's time to put everything together and see how the Ane performs in a real world execution of a convolution operation using the resonet 50 model the test I'm running Benchmark corl INF fur we are using the CPU and Ne setting here ready to go and boom there they go oh M3 Max finished first 6487 samples per second M1 finished second by the way this is not really a race 5318 samples per second on that one and finally 640 samples per second on the M2 Max the M2 Max not far behind the M3 Max here I reran this on the GPU instead and the results are considerably slower M1 at 13.7 M2 Max 333 and M 3ax 44 now that was a batch size of just one and the neural engine seems to be doing much better according to Timothy's blog post on any batch size that's less than 32 I reran a test with a batch size of 256 and the GPU did much better well everything except the M1 which surprisingly did really well on the neural Engine with batch size 1 and batch size 256 while the M1 MacBook Air stands out for its efficiency and affordability it falls short in handling demand tasks of advanced machine learning really underscoring The Divide between entry level and high-end Apple silicon machines on the other end of the spectrum the M3 Max MacBook Pro emerges as a Powerhouse capable of tackling complex machine learning workflows especially with the available 128 GB of memory mine only has 64 it may not be as fast as an RTX 490 but it's more reliable for larger models due to the unified memory architecture for instance models like the mix extrol 8x7 that just recently was released it needs about 100 GB of memory to run and that challenges even the 128 GB Max my own trials to run it on a 64 GB machine were unsuccessful and it's a tight squeeze even on the 128 GB versions provided you turn off everything else on the machine Apple's mlx framework that was just released targets machine learning workflows and this really shows that apple is serious about taking on the AI field and making it accessible I put that in quotes because well highend Maxs are expensive expensive and will continue to be more and more expensive but when you compare to the Nvidia a100 GPU for example with 80 GB of RAM and that cost $188,000 well these start to look like a bargain check out my previous video for a deep dive into machine learning where I compare MacBooks MacBook studio and even Nvidia cards and subscribe for more insights in my upcoming content I'll see you later [Music] folks
Info
Channel: Alex Ziskind
Views: 87,097
Rating: undefined out of 5
Keywords: apple silicon, machine learning, ml, ai, m2 max, m1 macbook air, m3 max macbook pro, apple neural engine, numpy, accelerate framework, gpu performance, m1 vs m3 comparison, tensorflow metal, core ml, apple ml frameworks, m1 mac ai, m3 max ai, m1 vs m3 machine learning, m2 max macbook pro, apple silicon ml performance, deep learning mac, machine learning on mac, python ml, numpy accelerate, m1 air efficiency, m3 max power, m1 vs m2 vs m3, m1 air vs m3 max, apple amx
Id: snRdjD0w-hw
Channel Id: undefined
Length: 11min 6sec (666 seconds)
Published: Wed Dec 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.