CUDA Explained - Why Deep Learning uses GPUs

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome back to the series on neural network programming with PI George in this video we're going to introduce CUDA at a high level the goal of this post is to help beginners understand what CUDA is and how it fits in with pi Jorge and more importantly why we even use GPUs in neural network programming in the first place without further ado let's get started to understand CUDA we need to have a working knowledge of graphics processing units or GPUs a GPU is a processor that is good at handling specialized computations this is in contrast to a central processing unit or CPU which is a processor that is good at handling general computations CPUs are the processors that power most of the typical computations on our electronic devices a GPU can be much faster at computing than a CPU however this is not always the case the speed of a GPU relative to a CPU depends on the type of computation being performed the type of computation most suitable for GPU is a computation that can be done in parallel this brings us to parallel computing parallel computing is a type of computation whereby a particular computation is broken into independent smaller computations that can be carried out simultaneously the resulting computations are then recombined or synchronized to form the result of the original larger computation the number of tasks that a larger computation can be broken into depends on the number of cores contained on a particular piece of hardware cores are the units that actually do the computation within a given processor and CPUs typically have four a or 16 cores while GPUs have potentially thousands of cores there are other technical specifications that matter but this description is meant to drive the general idea with this working knowledge we can conclude that parallel computing is done using GPUs we can also conclude that tasks which are best suited to be of using a GPU our tasks that can be done in parallel if a computation can be done in parallel we can accelerate our computation using parallel programming approaches and GPUs let's turn our attention now to neural networks and see why GPUs are so heavily used in deep learning we have just seen that GPUs are will suited for parallel computing and this fact about GPUs is why deep learning uses them neural networks are embarrassingly parallel seriously in parallel computing an embarrassingly parallel task is a problem where little to no effort is needed to break the task down into an independent set of smaller tasks neural networks are embarrassingly parallel and GPUs typically have 3000 like high-end GPUs have 3000 cores that can run computations in parallel many of the computations we do in neural networks can indeed be easily broken into smaller computations that are independent with respect to one another so it's the nature of computations used in neural networks that makes GPUs so useful in deep learning let's look at an example computation that's often used in deep learning the convolution operation this animation showcases the convolution process without numbers we have an input channel in blue on the bottom a convolutional filter shaded on top of the input channel that is sliding across the input channel and a green output channel for each position of the convolutional filter on top of the input channel there's a corresponding green region on the output channel this is the output of the convolution operation at each point in the animation these computations are happening sequentially one after the other however each computation is independent from the others this means that none of the computations depend on the results of any of the other computations as a result all of these independent computations can happen in parallel on a GPU and the overall output channel can then be produced after all of the computations have been completed this allows us to see that the convolution operation can be accelerated using a parallel programming approach and a GPU this is where kou comes into play when invidious GPU computing approach pioneer was the entire stack thinking from architecture to processor two systems system software api's libraries and application solvers we optimized across the entire stack one domain at a time one domain at a time and it is incredibly hard working that's one the reasons why staking us almost ten years nvidia is a technology company that designs GPUs and they have created CUDA as a software platform that pairs with their GPU hardware making it easier for developers to build software that accelerates computations using the parallel processing power of NVIDIA GPUs and NVIDIA GPU is the hardware that enables parallel computations while CUDA is the software layer that provides an API for developers developers developers developers developers developers developers developers developers developers developers as a result you might have guessed that an NVIDIA GPU is required to use CUDA once you have an NVIDIA GPU CUDA can be downloaded and installed from Nvidia's website for free developers use CUDA by downloading the CUDA toolkit in with the toolkit come specialized libraries like kudi and in the CUDA deep learning neural network library the stack GPU computing basically works in several ways the first step of course is to build an amazing GPU that's the first step the first type is building the amazing GPU the second step is to create the libraries for that domain the system software the system's architecture the api's and the libraries accelerated libraries for that domain and in the case of high-performance computing it's linear algebra as FFTs it's all kinds of different ups of libraries and we have all the libraries created and now we deep learning KU DNN and with inference tensor RT the libraries are in place the third step is to work with all of the application developers the solvers technical teams work hand-in-hand to accelerate to refactor their algorithms of their application and run it on our libraries with pi torch cuda comes baked in from the start there are no additional downloads required all we need is to have a supported NVIDIA GPU and we can leverage CUDA using PI torch we don't need to know how to use the CUDA API directly now if we want it to work on the PI torch core development team or right PI torch extensions it would definitely be useful to know how to use CUDA directly much of pi torch is written in Python however at bottleneck points PI tortes drops into C C++ and CUDA to speed up processing and get that performance boost we fight it in various ways one of the simplest ways is we just move all of our functions to succeed C or C++ that are actually important it's a subtle trade-off because as use of Pi torch itself you want to make sure it's very easy to debug and extend while you're working with the day-to-day but if you want performance then the biggest hotspots cannot be in Python so the reason we went down python like instead of using c++ directly why we went to use python is because python is the most popular data science language but we have to make these constant trade-offs and fight Python all the time I'm in a Jupiter notebook now and I want to show you how to use with pi torch taking advantage of CUDA is extremely easy with pi torch if we want a particular computation to be performed on the GPU we can instruct PI torch to do so by calling the CUDA function on our data structures suppose we have the following code we assign T to be equal to a new torch dot tensor we'll learn more about this in future videos for now let's just focus on the tensor output so we see the tensor output we have a tensor with three elements the numbers one two and three the tensor object created in this way is on the CPU by default as a result any operations that we do using this tensor object will be carried out on the CPU now if we want to move this tensor onto the GPU we just write T CUDA calling the CUDA function on a tensor where it turns the same tensor but on the GPU so after running this code and we look at the tensor output we have the same tensor with three elements one two and three but we also have a device specified and this is what happens whenever the device is not the CPU we actually get the value in the output so we can see that our device is cuda zero the zero stands for the first index and the reason for this is that pi torch supports multiple GPUs so if you had multiple GPUs you could put this tensor on a particular GPU this ability makes PI quartz very versatile because computations can be selectively carried out either on the CPU or on the GPU with that being said I want to talk to you about a looming question we said that we can selectively run our computations on the GPU or on the CPU but why not just run every computation on the GPU is it a GPU faster than a CPU the answer is that a GPU is only faster for particular specialized tasks one issue that we can run into is bottlenecks that slow our performance for example moving data from the CPU to the GPU is costly so when we do this the overall performance might be slower if the computation task is a simple one moving relatively small computational tasks to the GPU won't speed us up very much and may indeed slow us down remember the GPU works well for tasks that can be broken into many smaller tasks and if the compute task is already small we won't have much to gain by breaking it up and moving it to the GPU for this reason it's often acceptable to simply use a CPU especially when just starting out and as we tackle larger more complicated problems we can begin using the GPU more heavily in the beginning the main tests that were accelerated using GPUs or computer graphics tasks hence the name graphics processing unit but in recent years many more varieties of parallel tasks have emerged one such task as we have seen is the task of training neural networks for deep learning deep learning along with many other scientific computing tasks that use parallel programming techniques are leading to a new type of programming model called GP GPU or general-purpose GPU computing NVIDIA has been a pioneer in this space Nvidia CEO Jensen Wong has envisioned GPU computing very early on which is why Kudo was created nearly 10 years ago even though cuda has been around for a long time it's just now beginning to really take flight and invidious work on CUDA up until now is why Nvidia is leading the way in terms of GPU computing for deep learning when we hear Jensen talk about the GPU computing stack he is referring to the GPU as the hardware on the bottom cuda as the software architecture on the top of the GPU and finally libraries like KU DN in on top of cuda this GPU computing stack is what supports the general-purpose computing capabilities on a chip that is otherwise very specialized we often see stacks like this in computer science as technology is built in layers sitting on top of CUDA and ku D and n in this stack is PI torch which is the framework we'll be working with that ultimately supports applications on top the paper I'm showing here takes a deep dive into GPU programming and CUDA but it goes much deeper than we need we will be working near the top of the stack here with pie George however it's beneficial to have a bird's-eye view of just where we're operating within the overall stack we are ready now to jump in with section 2 of this neural network programming series which is all about tensors remember to check the blog for this video on deep lizard calm and don't forget to check out the deep lizard hotline for exclusive perks and rewards thanks for watching and supporting collective intelligence I'll see you in the next one computing is the most important invention of humanity it is the single most important tool that we have ever created over the last 25 years the computer has advanced in performance 100 thousand times scientists and researchers are at the brink of discovering solutions for precision medicine they're at the brink of being able to solve weather prediction and understanding climate we're at the brink of being able to discover the next groundbreaking material that's light and strong or new ways of store energy we're at the brink of discovering away from machines to operate themselves we're at the brink of discovering artificial intelligence [Music]

Info

Channel: deeplizard

Views: 220,925

Rating: undefined out of 5

Keywords: deep learning activation function, AI, artificial intelligence, artificial neural network, autoencoders, batch normalization, clustering, CNN, convolutional neural network, data augmentation, deep learning, education, Tensorflow.js fine-tune, image classification, Keras, learning, machine learning, neural net, neural network, Python, relu, Sequential model, SGD, stochastic gradient descent, supervised, Tensorflow, Theano, transfer learning, tutorial, unsupervised learning, TFJS, PyTorch

Id: 6stDhEA0wFQ

Channel Id: undefined

Length: 13min 32sec (812 seconds)

Published: Sun Sep 09 2018