Everyone is busy talking about how amazing ChatGPT
is, but all I want to do is take a look behind the curtain and figure out what kind of hardware
it's running on. During my research for this video I came across some really interesting
answers, and without spoiling anything, you will be surprised how old some of the
hardware is that made ChatGPT possible in the first place. So without further ado, let's take
a closer look at the hardware behind ChatGPT. To start off this video, it's important to know
that there are two very different phases during the development of a machine learning model like
ChatGPT, and those two phases also have very different hardware requirements. First you have
to train the neural network. The training phase is where the neural network is fed with huge amounts
of data, which is then processed by billions of parameters. It's at this stage where the
combination of hardware and software is forming the neural network. It's basically the birth of
AI. The hardware requirements during the training stage are massive, because it has to handle
insanely large amounts of data being run against billions of different parameters and repeat that
process over and over again. What we are currently experiencing as ChatGPT is the so-called inference
phase, where a already fully trained and working neural network is applying its learned behavior
to new data, like your inputs and questions. Running inferences is, in general, less resource
intensive, at least when it comes to raw compute power. Low latency and high throughput become
much more important, because the AI is responding to many simultaneous requests, not unlike other
web-based services. But even if running a single instance of inference is a lot less demanding than
training the neural network in the first place, the sheer scale of providing inference to
potentially millions of users at the same time can exponentially increase the hardware requirements.
In a nutshell, training a neural network requires a huge amount of focused compute power, but once
training is finished this part is done. Using the neural network, which is called inference,
has a much lower base hardware requirement, but deploying the AI to many users at the same
time can greatly increase those requirements. Now that you know the basic difference between
AI inference and training, let's figure out what specific hardware was used to train the neural
network of ChatGPT, before we then take a look at the systems running the inference process. Of
course Microsoft and OpenAI are trying to keep the exact hardware configuration a secret. There
are only very general and open-ended answers, like that ChatGPT was trained on Microsoft Azure
infrastructure and that many AI models these days are trained on Nvidia GPUs. But of course
that won't deter us from finding out more! In May of 2020, almost three years ago, Microsoft
announced a new supercomputer built exclusively for OpenAI to train GPT-3, which is a predecessor
to the machine learning model used for ChatGPT. Microsoft wasn't very specific but revealed that
the supercomputer was using more than 285,000 CPU cores and over 10,000 GPUs. Microsoft also
claimed that this supercomputer would place within the top 5 of the TOP500 supercomputer list,
which at the time had to be over 100 petaflops peak performance. And even though Microsoft
tried to hide the specific hardware used in that supercomputer, they weren't very successful,
at least when it comes to the GPUs. A scientific paper about large language models, published by
OpenAI in July of 2020, reveals, and I quote: "All models were trained on V100 GPUs on part of
a high bandwidth cluster provided by Microsoft." This information is all we need, because while
285,000 CPU cores are nothing to scoff, at when it comes to running specialized AI calculations, they
pale in comparison to 10,000 GPUs. The software used by OpenAI utilizes Nvidia's CUDA deep neural
network library and as such a training is only really happening on the GPUs, the CPUs are more
of a supporting actor. Now that we figured out that GPT-3, a precursor to ChatGPT, was trained
on Nvidia V100 GPUs, let's see how fast they are, why OpenAI and Microsoft selected this specific
hardware and what it means for the training of ChatGPT. If you're not an AI you'll probably store
your passwords in your brain. That's why today's sponsor NordPass comes in. I've always thought
of myself as being good with passwords, but to be honest, with the ever increasing amount of
accounts and logins I'm confronted with, I started to use repetitive and low quality passwords.
Maybe you too have caught yourself using the same or slight variations of the same password
before. I even had to request new passwords because I forgot what specific password I used
for that one website. It got to a point where I was actually thinking about using those login
with Google or Facebook shortcuts, even though I would never want to share my logins with companies
who already know too much about me anyways. Using a unique and secure password is super important,
especially today, where your personal data is at risk in so many different ways. All it takes is
a single database breach or a website that didn't properly store and hash your password. NordPass
provides a robust and secure solution. You know I like to dig a little bit deeper and with NordPass
all your passwords are encrypted locally using XChaCha20 encryption, which is a faster and more
secure option compared to AES. Security isn't the only advantage of NordPass. Convenience isn't
my motivation for using a password manager, but NordPass does it really well. Of course you have
desktop clients for all operating systems from Windows and Mac OS to Linux and apps for Android
and iOS. It's so well integrated, you almost forget you are now using 100% unique and secure
passwords for every single one of your logins. NordPass is already very affordable and with
the code highyieldnordpass you get an exclusive two-year offer with an additional month for free
on top. Plus there are options for businesses using a company domain. If you still approach your
password security by hoping for the best, now is the right time to do something about it! For more
information go to nordpass.com/highyieldnordpass or click the link below the video and use code
highyieldnordpass. With that let's get back to Nvidia's V100 GPUs and why they were chosen by
Microsoft and OpenAI. Nvidia's Volta architecture, that's what the V in V100 stands for, is a really
interesting design for two distinct reasons: first, it introduced a major architectural change
over all previous Nvidia GPUs and second, from today's point of view it's actually really old
hardware. Nvidia's V100 GPUs are based on GV100, a 815 millimeter squared silicon chip with 21.1
billion transistors, produced by TSMC in a 12 nanometer process. Compared to the previous Pascal
generation Flagship GP100, it doesn't look that impressive. Sure it's faster, but the increase
in FP32 performance isn't that large and comes at a huge cost in form of the massively increased
die size. But this comparison tells only half the story, because for the first time ever GV100
introduced Nvidia's brand new tensor cores, something that didn't exist on GP100. If you're
a gamer, I'm sure you've heard of tensor cores before. They have been a part of Nvidia's GeForce
GPUs since the release of Turing and are used to accelerate DLSS. On the surface tensor cores
are not that different to traditional GPU cores, they are specialized hardware that excels at
matrix processing. To put it simple, they can run a lot of computations in parallel, but are limited
to basic multiply-accumulate calculations used in machine learning. Lots and lots of very simple
calculations at the same time. Volta was Nvidia's first GPU architecture specifically designed
to accelerate AI workloads, like training and inference. In the Volta architecture whitepaper
Nvidia claims up to six times faster AI inference and 12 times faster AI training. With just 640
new tensor cores GV100 is able to output a massive 125 teraflops! I'm sure you now understand why
training a large-scale model like GPT-3 wouldn't have been feasible before the introduction of
Volta. This is the key to the whole story of ChatGPT. The version of Volta used in Microsoft's
AI supercomputer back in 2020 was most likely part of Nvidia's Tesla product family with up to 32
gigabytes of fast HBM2 memory and with 10,000 GPUs at 125 FP16 tensor core teraflops each, the
whole system would be capable of 1.25 million tensor petaflops which is 1.25 exaflops. We are
talking about literal exascale performance here, at least in terms of FP16 tensor core data. But
here's the kicker: Volta was released back in May of 2017, almost 6 years ago from today and 3
years old in 2020, The reason Microsoft and OpenAI had to use Nvidia's Volta generation is pretty
simple: the new Ampere generation launched around the same time OpenAI started to train GPT-3,
it was just a little bit too late. Planning and building such a powerful supercomputer takes time
and waiting would have delayed the whole project. At the time Microsoft and OpenAI began planning
the supercomputer Volta was the only option and not only that, I'm sure that without Volta this
supercomputer would not have been built in the first place. And without it no GPtT-3 and probably
no ChatGPT. You can only envision and plan such a complex neural network if you have the hardware
to support your ideas. Now that we know when and how GPT-3 was trained, in 2020 on up to 10,000
Nvidia V100 GPUs, we can use that knowledge to try and figure out what hardware was used to
train ChatGPT. The relation of GPT-3 to ChatGPT is rather difficult to describe, because not all
information is public. GPT-3 is a larger and more general model with a wider variety of use cases.
GPT-3 can be used to generate text-based prompts, it can translate, summarize and classifies text
and much more. Some of these features are part of ChatGPT too, but GPT-3 in its original 2020 form
wouldn't be a very good chatbot for two reasons: first, it wasn't trained to give human-like
answers in form of a text-based chat conversation, and second, GPT-3 is a really large model and it
would require a huge amount of compute performance even during inference. That's why ChatGPT was
born. It's basically a much more specialized machine learning model focused on natural
text-based chat conversations and lower compute requirements. It's an evolution of GPT-3, but not
really better or more capable in general, just more focused and streamlined. If you have already
used ChatGPT you know that it has been trained on data with a cut off before the end of 2021, that's
the reason it doesn't have up-to-date information on current events. Regarding the time frame of its
training, OpenAI states that ChatGPT is fine-tuned from a model in the GPT-3.5 series which finished
training in early 2022. ChatGPT and GPT-3.5 were trained on Azure AI supercomputing infrastructure.
With this information we now have a time frame, early 2022, and the confirmation that it was again
trained on hardware provided by Microsoft. The question is, was it the same supercomputer which
trained GPT-3? The answer is probably no, because as we discussed before, Nvidia's V100 GPUs were
already quiet old when they were used for GPT-3 in 2020. On June 1st 2021 Microsoft officially
announced the availability of Nvidia A100 GPU clusters to its Azure customers, which of course
includes OpenAI, and since we know that ChatGPT was trained on Microsoft Azure infrastructure
after the introduction of A100 GPUs we can deduct, that it was trained on this type of GPU. The
only question left is how many A100 GPUs were used? To try and answer that question let's take
a look at the specs of the A100 first. Nvidia's Ampere generation A100 is based based on the GA100
chip which packs 54.2 billion transistors into a 826 millimeter squared large die produced
by TSMC in a 7 nanometer node. As before, the performance increase in traditional FP32 isn't
large at all, but tensor performance gets another huge bump to over 300 teraflops, 2.5 times the
performance of a single V100 GPU! And this speed up is achieved with even less tensor cores due
to the introduction of redesigned 3rd gen tensor cores. By the way, if you're wondering, second
gen was on Turning, between Volta and Ampere, but Nvidia didn't release any comparable products
on that architecture. Now of course Microsoft could have replaced all 10,000 Volta GPUs in
their supercomputer with 10,000 Ampere GPUs, but since ChatGPT is a more streamlined machine
learning model it wouldn't be a really economic decision, especially with Ampere being so much
faster. Luckily we have more information about the combined efforts from Nvidia and Microsoft
to create new AI supercomputer infrastructure. In October of 2021, before the training
of ChatGPT started, Nvidia and Microsoft announced a AI supercomputer they used to train
a new and extremely large neural network called Megatron-Turning NLG. And no, I'm not making this
name up. Megatron is using 530 billion parameters, a lot more than the 175 billion of GPT-3 and
it was trained on 560 Nvidia DGX A100 servers, each containing 8 Ampere A100 GPUs. You don't
need AI to figure out that's a total of 4,480 A100 GPUs. The supercomputer was specifically
built to train a 540 billion parameter neural network like Megatron-Turing NLG, it's more
than capable of handling the smaller ChatGPT model. Maybe it was even this exact same system
that trained ChatGPT, and if not I am confident the hardware used is very similar to the system
used for Megatron-Turing NLG: multiple clusters of Nvidia DGX or HGX A100 servers. Before we
talk about inference hardware, let's have a look at our findings so far: GPT-3 was trained on
a Microsoft supercomputer with 10,000 Nvidia V100 GPUs and over 285,000 CPU cores. We don't know
the exact amount of CPU cores or the CPU type, but in 2020 it was most likely Intel Xeon. These
specifications have been confirmed by Microsoft and Open AI. For ChatGPT the results are a bit
more of an educated guess, since official specs are not available. We know that Nvidia A100 GPUs
were used and at the time only AMD EPYC offered the new PCI-Express 4.0 standard, which is why
Nvidia Ampere GPUs were paired with AMD CPUs. A single NVIDIA DGX A100 server combines 8 A100
GPUs with two AMD EPYC 7742 server CPUs. If my deductions are correct and ChatGPT was trained
on a similar system as Megatron-Turing NLG, that gives us the following hardware specs for the
training of ChatGPT: 1,120 AMD EPYC CPUs with over 70,000 CPU cores and 4,480 Nvidia A100 GPUs.
That's close to 1.4 exaflops of FP16 tensor core performance! As always with my videos
and especially parts where educated guessing is involved, I invite you to cross check my
assumptions. If you think I might be wrong or missed something, leave a comment down below.
Now let's talk about inference and the question which hardware is currently powering ChatGPT.
Based on statements by Microsoft and OpenAI, ChatGPT inference is again running on Microsoft
Azure servers and asingle Nvidia DGX or HGX A100 instance should be plenty enough to run inference
for ChatCPT, but of course not for all of the 100 million active users at the same time. There are
clever ways to try and calculate how many systems are currently used for ChatGPT, created by people
much smarter than me. Semianalysis came up with a very interesting model and the result is that
at the current scale it would require well over 3,500 Nvidia A100 servers with close to 30,000
A100 GPUs only to provide inference for ChatGPT. That's a massive amount and a lot more than
what was used for training, which is supposed to be the more demanding phase. But as we talked
about in the beginning of the video, inference might be easy to run for a single instance, but if
deployed at scale hardware requirements increase exponentially (with the user base). One thing is
clear, at the current level of demand for ChatGPT, just keeping the service running is requiring
a massive amount of hardware and costs between 500,000 to 1 million dollars per day. It's not
cheap. Right now the publicity is definitely making it worth it for Microsoft and OpenAI, but
in the long run such a system most likely won't be able to stick to a free to use model, unless
better and more efficient hardware reduces the cost of running inference at scale. And there's a
lot of new hardware on the way! So far we talked about Volta and Ampere, but Nvidia's Hopper
generation has been shipping for a while now, providing another level up in AI performance.
Compared to Hopper, Ampere and Volta almost seem tiny! GH100 has a massive 80 billion transistors
on a 814 millimeter square die produced in TSMC's 4 nanometer node. This time even the speed
up in traditional FP32 performance is quite large, with over 3 times the flops of GA100. But the
tensor core speed up takes the cake, delivering 1,000 teraflops of tensor performance for a
single Nvidia H100 GPU. And it doesn't stop there, Nvidia also introduced a new INT8 mode with 2,000
tops specifically tailored for AI demands. The entire hardware industry is starting to shift
its focus on architectures specifically designed to accelerate AI workloads. In 2017 Nvidia was
entering uncharted territory with their Volta GPUs and its largest competitor, AMD, was
focused on releasing its first generation Ryzen CPUs and avoiding bankruptcy. In 2023
AMD has been completely transformed as a company and upcoming CDNA3 based MI300 GPUs
will provide strong competition for NVIDIA, especially when it comes to AI workloads. I will
look at CDNA3 and MI300 in an upcoming video, so make sure you subscribed if you're interested.
And GPUs are not the only product focused on AI, more and more so-called neural processing units
and AI engines are being developed. These are chips with only one focus: run AI training and
inference as fast and as efficient as possible. Jim Keller, a legendary semiconductor engineer
who worked on the original AMD Athlon 64 and AMD's come-back Zen architecture, now works at
Tenstorrent, a company focused on designing pure AI and machine learning focus processors. It seems
like hardware currently only has one single focus: AI, AI and even more AI! While working on the
script for this video I watched a video from Tom Scott, where he voiced his thoughts
and opinions about AI. He was trying to compare it to the rise of the internet, but
wasn't sure how far along the curve we are with machine learning and AI. Is ChatGPT only
the beginning, his so-called Napster moment, or are we already close to the top of what's
possible? And while I can't answer the question, looking at AI from a hardware point of view paints
a pretty clear picture. Current high-end machine learning models like GPT-3 were trained on
what I would call first gen AI hardware and even the highly praised and disruptive ChatGPT
only used Nvidia's second gen Ampere GPUs, a product released almost three years ago. What we
are currently experiencing are AI models planned and trained on last generation hardware. Everyone
talks about the future of AI but I can't help and point out that we don't even have to wait for the
future. Hopper released last year and has been shipped to customers in volume for a while now.
Just imagine the kind of AI models you could train on a supercomputer with 10,000 Nvidia H100 GPUs
compared to how 10,000 V100 GPUs trained GPT-3. It will take a while for the public to get access to
these new AI models, but they are already possible today. With more money flowing into AI, hardware
to accelerate it will advance even more rapidly. We have seen how firce competition between Intel
and AMD affected the CPU market and I'm sure Nvidia and AMD will fight for AI, AI progress is
hardware bound and the hardware is just getting started. In only a few years time, training a
model like ChatGPT will be part of your average machine learning course in college and running
inference is done on a dedicated AI engine inside of your smartphone. There is basically no limits,
which is amazing and scary at the same time, In my opinion ChatGPT isn't a "Napster moment",
it's more like pets.com. People use it and it's popular, but the real AI "Napster moment" will
be something where we don't have to ask ourselves the question, it will be strikingly obvious
for everyone to see. Thanks again to NordPass for sponsoring this video! I know, passwords
are something we don't like to talk about, but using unique and secure passwords to
protect your personal data has never been more important. Click the link below the
video and use code highyieldnordpass. I hope you found this video interesting, if you did
you know what to do, and see you in the next one.