Machine Learning on the M1 Macbook Pro?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Is the MacBook pro actually good for machine learning. Let's find out. Welcome back to the hardware series. This is going to be the first of two videos where we actually get to play with some hardware first, from the software perspective. And then from the actual touching hardware perspective, we're going to start with the M one chip that is in the new 2020 Mac books, mostly because. I have one, but also because there was a lot of hype when the one ship originally was announced, since it was designed to actually process things that required machine learning a lot faster than most standard chips that you see in computers. So I thought it would be fun to essentially do a benchmarking video where we actually try out the M one ship and compare it to standard computer CPU's as well as. Normal. GPU's like Tesla, the one hundreds, and ideally also a TPU. If I can get my hands on it, if you want to check out the rest of the hardware series, I'll include the playlist up in the cards and you should subscribe and turn it into post notifications so that you can see the last video in the series as well, so that you can follow the NLP series, which I'll be doing next month, which will focus on natural language processing from things like recurrent neural networks to transformers and GPT three. So before we dive into code, let's talk a little bit about what makes the M one chip different from things like the standard Intel chips that you normally find in Mac book breasts, the chip was designed specifically for Apple computers and interestingly enough, it's actually cheaper to produce than the Intel chips were for Apple to acquire, which makes the M one computers a lot cheaper than the standard models of most Apple computers. One of the more interesting things about it is. That unlike some other computer architectures, which we've talked about in other videos where things like your processor and your memory are separate ships that need to be connected. And which introduce things like latency, because we have to take that extra time to go to that other chip and collect information before we can process it somewhere else on the, on one ship. All of those things are on the same place on the same ship, which lets you process information and execute instructions and run programs a lot faster. I should say. Now, if you want to really deep dive onto the M one ship and Apple in general, you should go check out Ray Richie's channel. He does great in-depth videos on all of this stuff. And if you want even deeper dives, you should check them out a Nebula. But in short, the M one chip allows for much faster processing of most things, because you don't have to go as far to access things like memory because the memory and the processor on the same chip on top of that, there's also an eight core GPU on the M one ship, which in short allows you to run video games with us latency and generally process graphics a lot faster than you would otherwise. And then of course, why I was interested making this video, the M one ship features a 16 coordinational engine, which is specifically designed to optimize things that require machine learning or more specifically things that require a lot of matrix multiplication. And because of that, I wanted to see exactly how much better the M one chip was or. Wasn't at running fairly simple machine learning models. So I ran a bunch of code and we'll go through the results now. So I was originally inspired to make this video based on a weights and biases report that someone wrote up a couple months ago that looked to run a similar benchmark. This video is a bit of an extension of that work, but I want to mention that now, just to give credit to the person who wrote that because they did an awesome job, an initial benchmark. Additionally, if you're curious about writing something like this yourself, I've included a collab notebook in the description with the code that I use to run. Everything you'll see at the base code is essentially the same, but there were certain things that I had to tweak a little bit to accommodate for different requirements for things like TPU and especially for the. And one ship, which I'll get into after we go through the actual performance, starting off with things like accuracy, loss, validation, loss, I ran each model for 10 rounds and compared their outcomes. And so you can see that in most cases, things like accuracy, loss, validation, loss, validation, accuracy are fairly similar. Once you hit 10, this isn't super surprising. This was an amnesty model shade on a relatively complex network for the problem. So I intentionally designed a problem that I thought would be relatively easy for a model. Solved. So didn't have to worry as much about making persons when one model barely performed and another one knocked it out of the park, but getting on to something a little bit more interesting, we can look essentially at the system logs for each of these runs. And so what we're comparing here is F1 GPU only I'm one eager mode and a V 100. So a Tesla V 100 GPU. Run GPU only is something that you have to force the chip to do. By default, it goes into eager mode, which essentially allows the. Alas behind your computer to figure out what the optimal distribution of tasks across the GPU and CPU are. And when I ran those trials, I've found that the computer predominantly relied on the CPU. So I wanted to see what would happen if I ran everything on the GPU itself. And interestingly, what you can see here is that the view 100 is still a better GPU in effect. This is because it actually has more processing power than the M one chip has. So if we start by looking at these three graphs on the top, we can see that the Tesla would be 100, generally uses more power overall, but completes our training task in much less time. This isn't super surprising, mostly because running this through collab and running it through a Tesla via 100, you have access to more power than you do through a laptop. Interestingly, you can also see that, although the power usage as a percentage is higher for the V 100, the actual percentage of memory allocated is lower for the V 100 compared to the M one chip. This isn't super surprising, which we'll get to in the next row of graphs, because if you look at the GPU utilization, you'll see that in general, for the M one ship we're using basically all of our GPU resources to train this model. And as I mentioned earlier, because the heat tolerance of this laptop is lower. It generally stays at a relatively consistent temperature, slightly above room temperature, which isn't super surprising because if it were to get hotter than that, we'd. Have a rebel. As a quick aside, we can also talk a little bit about training time. So the four comparisons that I'm making here are the M one GPU only the eager mode, a Tesla via 100 and a collab CPU. So I collab running without any sort of hardware acceleration the V 100 trades this model in just over a minute. Whereas the. Um, one in eager mode took about five minutes to make it through all 10 rounds. And the in GPU, only mode actually took longer, which I thought was interesting that wasn't something I necessarily expected, I guess, is that when I specifically limit the model to the GPU, it also kicks up the neural engine, which has likely allowing the eager mode to actually run faster. Interestingly, if you look at CPU utilization, you'll see that the utilization for eager mode is actually below the utilization for GPU only. This is likely because in the eager mode, you're not actually building the graph associated with the network to run it. And I'll be doing a video on graph neural networks in the next month or so. So stay tuned for an explainer on that, but essentially building and graph takes time and CPU processing power. And so we're likely using up more of the CPU for that. And then as we build the graph, we can see that kind of go back down to something comparable to the M one shift. On the other hand, if you look at the collab version where we don't use any sort of hard work celebration, you can see there, we're basically using the entire CPU the entire time, which makes sense. And that in the V 100 version, we're initially using a lot of the CPU. This is likely in the initial setup of our model collection of data, things like that. And then as we actually train our model, you can see that drop. Pretty sharply down to about 50%, because at that point everything's being offload to the GPU. We don't need the CPU as much anymore. So if you'd like to do a deeper dive into the actual results, all link, the report that I made in weights and biases for this, but in short, the M one chip, especially if you use it. In eager mode, which is the default setting actually does speed up the processing time a lot compared to especially no hardware acceleration, you go from something on the order of 25 minutes, run time to around five minutes, which is definitely a great improvement. So if I were to look at this information without the broader context of the work that goes into running things on cloud servers or on one laptops or on something without hard work celebration, I would probably say that. The max are actually a pretty good machine learning resource, especially for someone who isn't necessarily doing something like research and development at an industry level. I think that if you're working at somewhere like Google, you're probably still going to use cloud servers to run your models, because anything you're running is going to be too big for a laptop anyway. But when it comes to actually running and playing with models on your laptop, installing TensorFlow for Mac is. A whole ordeal. In fact, if you follow me on Twitter, you probably saw me tweet that I spent four hours trying to figure out how to install this correctly. And that was four hours of someone who's fairly familiar with weird and complicated installations of software that isn't particularly mainstream or might still be in beta. So if you're new to machine learning and it's something that you're looking to get into, and you're thinking about an M one MacBook pro, because you think it will be a way to learn faster. I wouldn't necessarily recommend it for that because the learning curve for installing TensorFlow for Mac is. Steve having said that because it's such a new system, I would expect that by the time the next version of M one chips comes out likely the summer that we'll see more established setups so that this whole installation process isn't such pain. So now that we've looked at all of those results, is this laptop actually good for machine learning? Well, I think my answer would be, yes. Not really, at least not yet. The one system is useful for programs that have been developed to optimize the chip itself. Things like pixelated pro or final cut, which are designed to take advantage of the resources that the ship offers. However, it, most programs aren't actually optimized for the M chip yet. And instead use a system called Rosetta, which essentially allows them to run as normal on M one computers. But don't take advantage of any of the optimization. Additionally, installing the required libraries to actually do machine learning on the . Max is nontrivial to the main current system is TensorFlow from echo last, which allows you to use ML compute the library that allows you to actually use the optimization that comes with the M one ship. It's still in the process of being developed. It's effectively in beta. So it's still a little bit. Buggy, there are still some confusing issues with it. It can only run on certain versions of Python and you have to make sure that the version of Python that you're running on is actually optimized for use with the on one chip throw a lot of things that you have to figure out the process of installing this. And while a combination of. Prior knowledge and stack overflow got me through. I could certainly see this being challenging for anyone who is new to command line program. Plus the power of them. One ship is limited. It's not a cloud. Yeah. Use something like a Tesla V 100. And so you can't necessarily run large models on this or train large models on this because you'd likely run into memory errors in short. I'm certainly looking forward to playing more with the M one chip and seeing how much I can push it. What I can use it for. Apple's been developing this. Interesting developer program for machine learning people. So there's been a lot of great releases that can help you do more machine learning on things like your phone and your laptop. And hopefully I'll be making videos about that in the future, but as a research and development system, if you're interested in machine learning research, I'd say it's definitely a step in an interesting direction. As we've talked about new computer architectures have been of particular interest. The machine learning research community, because we're running up against that compute barrier for most standard systems. But I don't think that you're going to be running something like GPT three on this anytime soon.
Info
Channel: Jordan Harrod
Views: 50,675
Rating: undefined out of 5
Keywords: apple m1, apple silicon, m1 macbook pro, m1 machine learning, m1 machine learning benchmark, apple m1 chip, apple silicon mac, apple silicon macbook pro, m1 macbook pro review, apple silicon macbook, macbook pro, davinci resolve 17, fcx, m1x, macbook pro 2021, macbook pro apple silicon, apple silicon performance, apple m1 benchmark, arm macbook pro, m1 macbook pro hands on, machine learning, computer architecture, m1, apple, intel, m1 macbook
Id: Y3INzc4EH60
Channel Id: undefined
Length: 13min 25sec (805 seconds)
Published: Mon Mar 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.