Is the MacBook pro actually
good for machine learning. Let's find out. Welcome back to the hardware series. This is going to be the first of
two videos where we actually get to play with some hardware first,
from the software perspective. And then from the actual touching
hardware perspective, we're going to start with the M one chip that is in
the new 2020 Mac books, mostly because. I have one, but also because there was a
lot of hype when the one ship originally was announced, since it was designed to
actually process things that required machine learning a lot faster than most
standard chips that you see in computers. So I thought it would be fun to
essentially do a benchmarking video where we actually try out
the M one ship and compare it to standard computer CPU's as well as. Normal. GPU's like Tesla, the one
hundreds, and ideally also a TPU. If I can get my hands on it, if you want
to check out the rest of the hardware series, I'll include the playlist up
in the cards and you should subscribe and turn it into post notifications
so that you can see the last video in the series as well, so that you
can follow the NLP series, which I'll be doing next month, which will
focus on natural language processing from things like recurrent neural
networks to transformers and GPT three. So before we dive into code, let's talk
a little bit about what makes the M one chip different from things like the
standard Intel chips that you normally find in Mac book breasts, the chip was
designed specifically for Apple computers and interestingly enough, it's actually
cheaper to produce than the Intel chips were for Apple to acquire, which makes
the M one computers a lot cheaper than the standard models of most Apple computers. One of the more interesting
things about it is. That unlike some other computer
architectures, which we've talked about in other videos where things like your
processor and your memory are separate ships that need to be connected. And which introduce things like latency,
because we have to take that extra time to go to that other chip and collect
information before we can process it somewhere else on the, on one ship. All of those things are on the same place
on the same ship, which lets you process information and execute instructions
and run programs a lot faster. I should say. Now, if you want to really deep dive onto
the M one ship and Apple in general, you should go check out Ray Richie's channel. He does great in-depth
videos on all of this stuff. And if you want even deeper dives,
you should check them out a Nebula. But in short, the M one chip allows for
much faster processing of most things, because you don't have to go as far to
access things like memory because the memory and the processor on the same chip
on top of that, there's also an eight core GPU on the M one ship, which in
short allows you to run video games with us latency and generally process graphics
a lot faster than you would otherwise. And then of course, why I was interested
making this video, the M one ship features a 16 coordinational engine,
which is specifically designed to optimize things that require machine
learning or more specifically things that require a lot of matrix multiplication. And because of that, I wanted
to see exactly how much better the M one chip was or. Wasn't at running fairly
simple machine learning models. So I ran a bunch of code and
we'll go through the results now. So I was originally inspired to make this
video based on a weights and biases report that someone wrote up a couple months ago
that looked to run a similar benchmark. This video is a bit of an extension
of that work, but I want to mention that now, just to give credit to the
person who wrote that because they did an awesome job, an initial benchmark. Additionally, if you're curious
about writing something like this yourself, I've included a
collab notebook in the description with the code that I use to run. Everything you'll see at the base
code is essentially the same, but there were certain things that I had
to tweak a little bit to accommodate for different requirements for things
like TPU and especially for the. And one ship, which I'll get into after
we go through the actual performance, starting off with things like accuracy,
loss, validation, loss, I ran each model for 10 rounds and compared their outcomes. And so you can see that in most
cases, things like accuracy, loss, validation, loss, validation,
accuracy are fairly similar. Once you hit 10, this
isn't super surprising. This was an amnesty model
shade on a relatively complex network for the problem. So I intentionally designed a
problem that I thought would be relatively easy for a model. Solved. So didn't have to worry as much about
making persons when one model barely performed and another one knocked it
out of the park, but getting on to something a little bit more interesting,
we can look essentially at the system logs for each of these runs. And so what we're comparing here is F1
GPU only I'm one eager mode and a V 100. So a Tesla V 100 GPU. Run GPU only is something that
you have to force the chip to do. By default, it goes into eager
mode, which essentially allows the. Alas behind your computer to figure
out what the optimal distribution of tasks across the GPU and CPU are. And when I ran those trials,
I've found that the computer predominantly relied on the CPU. So I wanted to see what would happen
if I ran everything on the GPU itself. And interestingly, what you can
see here is that the view 100 is still a better GPU in effect. This is because it actually has more
processing power than the M one chip has. So if we start by looking at these
three graphs on the top, we can see that the Tesla would be 100, generally
uses more power overall, but completes our training task in much less time. This isn't super surprising, mostly
because running this through collab and running it through a Tesla
via 100, you have access to more power than you do through a laptop. Interestingly, you can also see that,
although the power usage as a percentage is higher for the V 100, the actual
percentage of memory allocated is lower for the V 100 compared to the M one chip. This isn't super surprising, which we'll
get to in the next row of graphs, because if you look at the GPU utilization,
you'll see that in general, for the M one ship we're using basically all of
our GPU resources to train this model. And as I mentioned earlier, because the
heat tolerance of this laptop is lower. It generally stays at a relatively
consistent temperature, slightly above room temperature, which isn't
super surprising because if it were to get hotter than that, we'd. Have a rebel. As a quick aside, we can also talk
a little bit about training time. So the four comparisons that I'm making
here are the M one GPU only the eager mode, a Tesla via 100 and a collab CPU. So I collab running without any sort
of hardware acceleration the V 100 trades this model in just over a minute. Whereas the. Um, one in eager mode took about five
minutes to make it through all 10 rounds. And the in GPU, only mode actually
took longer, which I thought was interesting that wasn't something I
necessarily expected, I guess, is that when I specifically limit the model to
the GPU, it also kicks up the neural engine, which has likely allowing the
eager mode to actually run faster. Interestingly, if you look at CPU
utilization, you'll see that the utilization for eager mode is actually
below the utilization for GPU only. This is likely because in the eager mode,
you're not actually building the graph associated with the network to run it. And I'll be doing a video on graph
neural networks in the next month or so. So stay tuned for an explainer on that,
but essentially building and graph takes time and CPU processing power. And so we're likely using
up more of the CPU for that. And then as we build the graph, we
can see that kind of go back down to something comparable to the M one shift. On the other hand, if you look at the
collab version where we don't use any sort of hard work celebration, you can see
there, we're basically using the entire CPU the entire time, which makes sense. And that in the V 100 version, we're
initially using a lot of the CPU. This is likely in the initial
setup of our model collection of data, things like that. And then as we actually train
our model, you can see that drop. Pretty sharply down to about 50%,
because at that point everything's being offload to the GPU. We don't need the CPU as much anymore. So if you'd like to do a deeper dive
into the actual results, all link, the report that I made in weights and
biases for this, but in short, the M one chip, especially if you use it. In eager mode, which is the default
setting actually does speed up the processing time a lot compared to
especially no hardware acceleration, you go from something on the order of 25
minutes, run time to around five minutes, which is definitely a great improvement. So if I were to look at this information
without the broader context of the work that goes into running things
on cloud servers or on one laptops or on something without hard work
celebration, I would probably say that. The max are actually a pretty good
machine learning resource, especially for someone who isn't necessarily
doing something like research and development at an industry level. I think that if you're working
at somewhere like Google, you're probably still going to use cloud
servers to run your models, because anything you're running is going
to be too big for a laptop anyway. But when it comes to actually running
and playing with models on your laptop, installing TensorFlow for Mac is. A whole ordeal. In fact, if you follow me on Twitter,
you probably saw me tweet that I spent four hours trying to figure
out how to install this correctly. And that was four hours of someone
who's fairly familiar with weird and complicated installations of
software that isn't particularly mainstream or might still be in beta. So if you're new to machine learning
and it's something that you're looking to get into, and you're thinking about
an M one MacBook pro, because you think it will be a way to learn faster. I wouldn't necessarily recommend it
for that because the learning curve for installing TensorFlow for Mac is. Steve having said that because it's such
a new system, I would expect that by the time the next version of M one chips comes
out likely the summer that we'll see more established setups so that this whole
installation process isn't such pain. So now that we've looked at all
of those results, is this laptop actually good for machine learning? Well, I think my answer would be, yes. Not really, at least not yet. The one system is useful for
programs that have been developed to optimize the chip itself. Things like pixelated pro or final cut,
which are designed to take advantage of the resources that the ship offers. However, it, most programs aren't
actually optimized for the M chip yet. And instead use a system called
Rosetta, which essentially allows them to run as normal on M one computers. But don't take advantage
of any of the optimization. Additionally, installing the required
libraries to actually do machine learning on the . Max is nontrivial to
the main current system is TensorFlow from echo last, which allows you to
use ML compute the library that allows you to actually use the optimization
that comes with the M one ship. It's still in the process
of being developed. It's effectively in beta. So it's still a little bit. Buggy, there are still some
confusing issues with it. It can only run on certain versions
of Python and you have to make sure that the version of Python that you're
running on is actually optimized for use with the on one chip throw a lot
of things that you have to figure out the process of installing this. And while a combination of. Prior knowledge and stack
overflow got me through. I could certainly see this being
challenging for anyone who is new to command line program. Plus the power of them. One ship is limited. It's not a cloud. Yeah. Use something like a Tesla V 100. And so you can't necessarily run
large models on this or train large models on this because you'd likely
run into memory errors in short. I'm certainly looking forward to
playing more with the M one chip and seeing how much I can push it. What I can use it for. Apple's been developing this. Interesting developer program
for machine learning people. So there's been a lot of great releases
that can help you do more machine learning on things like your phone and your laptop. And hopefully I'll be making videos
about that in the future, but as a research and development system, if
you're interested in machine learning research, I'd say it's definitely
a step in an interesting direction. As we've talked about new
computer architectures have been of particular interest. The machine learning research community,
because we're running up against that compute barrier for most standard systems. But I don't think that you're
going to be running something like GPT three on this anytime soon.