What type of GPU should you buy in 2021? That's
the topic of this video, and just to set the scope of this I am covering GPUs for either a
desktop system or a laptop both desktop systems that you might build yourself or if you're
going with an OEM. And specifically for deep learning. I will focus on NVIDIA because NVIDIA is
supported out of the box by TensorFlow, PyTorch, these kind of things. Also if you're using a GPU
in the cloud which is often what I'm developing for on a desktop or a laptop NVIDIA is the type
of GPU that is available in the cloud systems. So a GPU, graphics processing unit, traditionally
for video games but the math is very similar between video games and neural networks. So this
they've really found their home in deep learning as well. Now you can decide to have... By
the way this is an older GPU. It's a prop mainly for the YouTube video, but it has all
the right connectors that I want to talk about, in terms of what you should be looking for
when you are choosing a GPU for deep learning. We're going to talk specifically about memory,
about cooling, if more than one GPU will help what you're working on or not and
the different lines of GPU in NVIDIA, because there's a lot of choices. And 2021 has
brought the introduction of the Ampere line and that that changes everything. So one of the first
things that I found somewhat confusing when I was reading about GPUs is just the different ways
that they cool. You will generally hear the cooling of a GPU referred to as either blower
or fan and this has a lot to do with how densely you can pack these GPUs into a system. A blower
which this type of GPU that I'm holding here, takes air in through through this mechanism a
fan-like mechanism and sends it out the card that you have here. So it's taking the air and blowing
it completely outside of the system. Whereas a fan GPU is going to have two fans here that are
either circulating through the GPU or simply drawing air through and and blowing it out into
the system case. If you're going to have two of these together, that becomes an important
consideration because if you pack them literally right next to each other. If it's
a fan configuration it's going to be blowing blowing hot air into the other GPU or drawing the
hot air the exhaust from from the other GPU into the one that we're talking about. So, that becomes
a bit of an issue if you're mounting two fan type GPUs usually they're mounted more like
this in a case and then possibly in the link connecting them. And speaking of NVLink. A
lot of GPUs will have a card connector up here where you can basically put, a bridge so to speak
connector between the two and that combines them. Not into the same physical device but it provides
a bridge that can transfer data between the memories of the two GPUs very effectively. This
was used a lot in video games when you would typically put two GPUs together and try to have
the game see them almost as the same processing unit. In deep learning configurations and machine
learning you often might run it without that link. It's not necessarily needed if you're primarily
using these for parallel processing capabilities. I've used systems both with and without NV-link
it's definitely not a requirement and not all GPUs even have it, and we'll get into that in a moment.
The other connector that's very important on a GPU is the power. This just keeps getting wider and
wider on GPUs as they need more and more wattage coming from the power supply to power them.
Another thing I definitely want to mention in this video I have not worked with every GPU setup
that there is and this is as much a discussion as me showing you some of the things that have
worked well for me. So have you tried GPUs for in parallel for tasks in different ways that I
will describe in this video, please post in the comments. I would love to hear what you what you
think and if you disagree with anything I have in here, definitely let me know in the comments.
First let's talk about memory, because in my opinion this is probably the most important factor
in choosing a GPU for deep learning. This is a chart that I found on NVIDIA StyleGAN2, and I work
with StyleGAN quite a bit for image generation. So I'm very interested in having enough memory to
train the various GANs that I want to deal with. This is a good representation for a
lot of different memory applications. You need to have enough memory on the GPU to hold
your neural network and one batch. So now you can play around with that a bit you can do simpler
neural networks or you can decrease your batch size and that might help. But look at this just
using the basic configurations from the paper you'll see that for a 1024 by 1024 GAN and
that's that nice faces GAN that you've seen training for 25,000 images. So this is all
those nice realistic faces that you see for GANs to train this from scratch is
going to retire 13.3 gigabytes on the GPU. Now if you've got multiple GPUs, it requires that
much on each of them it's not like they combine. Now if you're willing to decrease down
the configuration smaller batch sizes, other things you can get that down to 8.6 or
if you want to really drop your resolution down to 6.4. You could probably even go into lower
batch size configurations and get that down a bit, but the point is with less memory you're going
to be compromising and you're going to be scaling back on the neural networks that you're
trying to create. If you're creating your neural networks from scratch and you're defining how
many layers and other things then you're going to be defining the size and you're going to probably
fit it to the GPU, but once you hit that ceiling, you error out and stop. It's not like it's
just going to go slower or something like that. Not having enough GPUs will just make it go
slower and we'll get into that in a moment so let's look at some of the common
GPUs remembering that around 13 and 6.4 are where you need to be. Here you can see the
current line of the the 30, the ampere series. 24 is the the 3090 the big one 3080 the next
one down 10 gigabytes and then you're right right on the edge with that eight. Uh you could
run the smaller ones for sure for the GANs the 256 by 256 but bigger you're going to have trouble
even with the 10 gigabytes. So you can tell this 3090 is really the one that NVIDIA created
for deep learning. I don't know that this is going to do a whole lot more for a gamer versus
the 3080. It's got more cores so it certainly probably would be a better gaming system, but if
I was going to buy a 30 series I would need the 3090 just on memory to do some of the things that
I want to do. The 10 would work generally for me, but for the kinds of things that I would do
I would be running into the ceiling a lot on the 3080. Now if you're just getting into
deep learning you can probably scale your needs back and you can fit into either of these. I'm
glad to see even the 8 gig on both of those. If you're constructing basic neural networks,
like would be in my course in deep learning, you'd probably be fine with the 8 gigabytes.
Now when you're talking Quadro memory is not generally a problem if you look at the quick
specs, on like the new ampere the 8 the a6000. It's got 48 gigabytes of
RAM. I mean that's insane, that is well bigger than most things that I would
need for this now something like a BlazingSQL or DASK where it is parallelizing things across your your GPU, with RAPIDS, the
more memory you can throw at that absolutely the better. Quick specs on the 5000
put it at 16 gigabytes so that I mean the the Quadro jumps up pretty quickly to the class of
RAM that you need for deep learning problems. Now, let's talk about multiple GPUs
together there's several considerations here are you buying very high-end GPU just to
parallelize and do more and more or do you want an entry level GPU that you can maybe add
another one on later as your needs grow. We'll talk about both of these scenarios there are a
lot of different ways that you can use multiple gpus the two primary ways and i'm pulling this
from the Keras TensorFlow documentation and PyTorch really operates pretty similar, is
data parallelization or model parallelization. Data parallelization is where you're copying
the same data across all of your GPUs and you're really just using the GPUs to speed up
what you would have done with just one GPU. This is what I use when I'm using StyleGAN2 ADA
to train faster, so that I can get that model done much quicker than with a single GPU. The
two GPU when you're doing data parallelization, their memories don't really combine,
so you can't take on something bigger than what your original GPU had the
memory to deal with. Model parallelization and I will say this is a little bit more rare,
I have worked primarily in data parallelization, but model parallelization is usually
when you want to have multiple GPUs as members of the same neural network. this is
usually for a crazy big neural network that just would not fit in one GPU to begin with. Now you
could, if you have smaller GPUs, with less memory, use this to train a larger memory than you would
have normally had access to. And there might be some advantages to that. However, that's not
usually the case, and that would require some engineering on your part to get that really set
up properly. So looking at the GPUs we'll see more on this in just a moment, but the 3090
and 3080 (actually just 3090) are the two of the 30 series that support NVLink. Now you can put
multiples of any of these in here and do the data parallelization. You just can't move data across
them particularly fast. Now I have not tried this. But I really do believe it would be theoretically
possible you could have multiple different types of GPU there. Now your workloads are going to take
different amounts of time and again you're going to need to do some engineering to make that all
work out correctly. But if I were looking at this and I wanted to buy something and upgrade later I
might think about a 3080 and it's got the 10 GB of RAM if my model doesn't fit in it buying a
second 3080 is not going to necessarily help me. An interesting experiment and if anybody has
any opinions on if this would work or not maybe you have more money when you're buying the second
GPU maybe you bought a 3080 and you throw a 3090 onto it you could potentially use that 3090
when you need something that has the bigger RAM you could use both of them together just to
speed up training when the 10 GB is enough to fit your model in there. Like I said, I have
not tried that would be a bit unorthodox but I think in theory that might work, if you do some
work on your on your engineering and pipeline. But any of these, if you want to buy a second
one, at some point and again you're only going to get NVLink on the top two. It's not going
to let you handle bigger models but it will pretty easily let you train nearly twice as
fast as what you would have done before and will it train twice as fast. Let's go back to this
chart. Notice the training times here to do the state-of-the-art style GAN faces eight GPUs
takes nine days. I mean this is heavy duty GPU training to truly build
the state-of-the-art face scan. But look it scales pretty linearly
four GPUs took twice as long and then doubles and then pretty much doubles. So this
this is nice scalability you can literally just throw more GPUs at it. And this is eight GPUs on
a single system most systems that you will see will have only two GPUs on them. I know the Lenovo
P920 line you can put a third GPU into the system. But if you start to go above four, to four and
eight you're probably dealing with one of the NVIDIA DGX systems or you you've really done
some serious engineering to get that many GPUs onto a actual motherboard. The other
thing I'm going to mention is NVLink it's not a necessity and i have worked in
systems myself more without it than with it. When you're working on the cloud it's generally there
because the AWS EC2 instances make it available. At least the systems that I've worked with. If
you're doing something like rapids or BlazingSQL, things that actually support it out of the box.
The NVLink will considerably speed things up. It's supported in TensorFlow, but you have to
specifically make it take advantage of NVlink with an actual command, and with
some engineering into your model. So it's not going to necessarily just out of the box
do a great deal for you. Again anybody have other experiences with NVLink on the started situation
I'd love to hear about it in the comments. This is not an area I've worked with a great
deal. What about laptops? If you're on the run you don't want to have a monster tower with you
all the time maybe you're going to send data up to the cloud for processing. What do GPUs look like
on laptops because some computers support them really well some not so well. Now laptops have a
lot of the same types of GPUs available on them. A lot of the Ampere have not become available
yet in mobile but that is that is definitely coming. NVIDIA has made this a bit more simple to
understand so if you look at the Quadro RTX 5000 and 4000 the desktop model of this exactly same
named GPU the performance is nearly exactly the same between mobile and desktop. So this makes it
pretty simple the laptop GPUs are named the same as the desktop. That's a little trickier to
compare price here because you're going to be buying this as part of a system the one that I
have the most experience with myself is a Quadro RTX 5000 which is installed on a Lenovo ThinkPad
P53 that Lenovo has been kind enough to loan me for a bit to try out with the YouTube channel,
and I will say it is it is quite fast. It seems to have the same the same specs as the the higher
Quadro for the desktop. What I have found on the laptops is if you can afford them the Quadros are
great because they get you the memory that you need. The memory to me is really the key feature
for a deep learning machine. Now if we look at picking graphics cards for laptops this is the
selection page on one vendor that that i've looked at laptops before and you'll see that basically
they are offering the same 20 series that that you get basically in the desktop line. However
if you're not going with Quadro, at this point you're really not getting the the memory that I
would really like to have in a laptop situation. But this gives you an idea of some of the relative
costs and all these dollars are in US dollars. Okay, there's all kinds of lines of GPU you've
probably heard of. Names like Tesla and Ampere, the latest one, GeForce Quadro. What
does all this mean, let's take a look. Great so which GPU should you get well it
really depends the GPU that I purchase. I mean I work with deep learning extensively. Machine
learning this is my career I'm fairly advanced in my career, so my compensation level is and net
worth level are probably at the point that I will more throw money at these kind of things than
some of you might, for me on a desktop. I would really be looking probably if it's a system that
I'm putting together myself or purchasing. I would look at probably dual 3090s or starting with one
3090 and adding the second one if I really needed it the 24 GB. For what I do is really enough and
the cores is going to be quite fast it's Ampere so not everything is necessarily taking advantage
of all the cool things that this GPU can do in early 2021. If I didn't have quite as much
money to spend I would very much be looking at the 3080 or the 3070. I would really really try to
get the 3080 because that extra two gigabytes of ram could make all the difference for certain
things that i would work on and it's a lot less money than that 3090. Other ones that you might
think about if you really want power and to pay less for it the 20 series. You can possibly find
these guys on eBAY and other other options as well since they're now kind of one behind the touring
architecture I run a TITAN RTX myself that's my primary GPU and I absolutely love the card, so
you could look potentially at the 2080ti something like that. It's very very similar to the TITAN it
just does not have as much memory and that 11 GBis going to do a lot for you if I was dealing for me
my own personal buying. If I was dealing with a laptop at this point I would probably get a Quadro
5000 or a 4000. I just really want that RAM. I would probably get the 5000 because
the 4000 you're pretty close now into the into the 20 series i would also really
be keeping an eye as these as these 30 series Ampere really starts to become available
in the laptops and as the Ampere the a6000 mobile equivalent also becomes available. These
are all the things that I would be watching if you're a student say in my deep learning course
I would highly highly recommend the 2080 at seven hundred dollars that is going to do pretty much
everything in the class as would probably the 30-70 that's kind of how I see this. What do
you think everybody? I'm sure has different opinions here and has probably worked in
different scenarios than what i have so let me know in the comments. Very very curious
to see this is a discussion as much as me standing on high and telling you
what you what you should do I'm sharing my experience with you and my opinions.
but these are opinions I'm very curious to hear hear from you guys as well. Thank you for watching
this video please like and subscribe if this was useful to you. Please follow the channel because
there's all things artificial intelligence and particularly generative networks like
GANs and other other cool technology.