comparing GPUs to CPUs isn't fair

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if you didn't know any better you would think that gpus and CPUs were the same they'd both take in data from the user they both do some kind of math operation on them and they both returned them to the user either in the form of Graphics or computational results I made a video about the physical limitations of CPUs and why CPUs can't have thousands of cores but a lot of the comments were well hey man gpus have thousands of cores so in this video we're going to talk about why GPU microarchitecture makes CPU cores and GPU cores a little different than each other your modern GPU has on the order of 5 to 10 even some 15 000 GPU cores modern gpus such as the latest Nvidia 4090 is able to perform 49 teraflops that's 49 a trillion floating operations per second the reason that floating Point numbers are so important in graphics processing is that regardless of how beautiful your graphics look at the end of the day all your GPU is doing is crunching numbers these number numbers are information from a scene based on lighting information like texture values and Shader values that are all getting put together to determine what color they should make a pixel on your screen this is all based on very complicated matrix multiplication and Vector math that we're not going to touch on on this video but just know that most of the time these values are stored by very high Precision floating Point numbers that your GPU needs to know how to process to determine what color to present on your screen compare this to the Intel Core I9 13900k only able to perform 849 gigaflops per second that is 50 times less than our Nvidia 4090. with these numbers you'd probably think the GPU has a computational advantage and the numbers do play that way but there is one slight caveat if that's the case why are we not using gpus for everyday Computing why not replace your CPU in your computer with the processor running in your GPU this is because GPU cores aren't exactly a core let me explain gpus are able to perform high throughput high bandwidth floating Point operations because of some very delicate design choices that made them good for high-speed math operations but bad for general purpose Computing at the core of a Nvidia GPU is well a core this core is the execution engine for doing the algorithms that is responsible for giving you cutting edge Graphics inside a Cuda core there isn't much the Cuda core has four primary components first an FPU or a floating Point Unit to conduct floating Point operations an INT unit to conduct integer operations on scalar values A dispatch unit to receive the data from its higher level scheduler and a results queue to give the results back to the higher level scheduler while that sounds simple there are some very strict limitations with this design when a GPU receives an instruction to run that instruction is received by a scheduler that scheduler hands that task out to a thread to run that instruction and then the thread makes use of the Cuda core to do the math operation in the classical CPU example think of a thread like the control unit and the Cuda core like the ALU now these threads are organized in groups called warps that's w-a-r-p like warp speed and warps contain 32 threads per warp to make execution fast warps use a design principle called Sim d That's single instruction multiple data by doing this a warp will have all 32 threads run the same instructions on different data as they run doing this enables the GPU to do high bandwidth operations on large amounts of data like a graphics process by running the same instructions in parallel on bulk data the GPU can outperform the CPU in terms of floating Point operations this does create constraints for your program though if threads in the same warp take a conditional Branch only threads on the same path of execution will execute the rest will block let's say for example that a warp is executing this block here of Nvidia PTX pseudocode all the threads are parsing data and if the data indexed by the thread ID is even then condition X happens otherwise condition y happens let's say for the case of the example that half of the threads meet condition X and half of the threads meet condition y because of the Sim D principle that the warps are designed around the Y condition threads will not begin to execute until the X condition threads are finished executing and return to the common part of the execution path which is the area after the conditional jump in the if statement simply put a warp can only execute one instruction at a time therefore limiting the computing power of your GPU to the number of Cuda cores divided by the warp size which is 32. CPU designers on the other hand treated their cores much differently each core is able to run an arbitrary set of instructions organized into an arbitrary set of process sees that are constantly context switching inside of the kernel each CPU core is designed around the fact that these instructions May randomly Branch to any instruction or randomly Access Memory at any time during the process execution each core therefore has multiple layers of caches and Branch prediction engines that protect the core from encountering delays during execution GPU cores have neither of these at the end of the day CPUs are designed to process programs that run on arbitrary input everything from your word processor to the game you played earlier today to the TCP stack that brought you this video were all executed by your CPU CPU cores are therefore more generalized the GPU on the other hand when designed had a much more narrowly scoped task because of its narrow scope the GPU designers could focus on this specific task and make it more efficient at doing this than everything else CPU does so do gpus have thousands of cores yes they do 100 they do these cores are a amazing and can do math at unfathomable speeds but are these GPU cores that do exist in the thousands the same as CPU cores no not at all CPU cores because of their design to handle well everything are just that a jack of all trades but a master of none so to compare CPU cores to GPU cores is questionable at best oh [Music]
Info
Channel: Low Level Learning
Views: 285,483
Rating: undefined out of 5
Keywords: cpu vs gpu, gpu cores, nvidia cores, cuda cores, cuda programming, c programming, c programming tutorial, malware, reverse engineer, reverse engineering, reverseing, arm division, fixed point multiplication, hackers, security, binary, hexidecimal, raspberry pi, pico, rpi, microcontroller, arduino, maker, craft, hobby, electronics, wires, temperature, safety, project, board, electric, leds, led, thonny, os, ide, probe
Id: xi-wTlVUZsQ
Channel Id: undefined
Length: 6min 29sec (389 seconds)
Published: Sat Jan 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.