FASTER Inference with Torch TensorRT Deep Learning for Beginners - CPU vs CUDA

Video Statistics and Information

Video

Captions Word Cloud

Captions

hi everyone in the last video we've seen how to accelerate the speed of our programs with pytorch and cuda today we will take it another step further with torch tensor rt now tensor rt is a software development kit made by nvidia it is built upon cuda and it uses very clever techniques to make our code run much more efficiently so in this tutorial we will focus on a machine learning process called inference this is where our model is trained perfected and it's ready to serve its purpose or in other words ready to make a prediction and even though this is a beginner's friendly tutorial believe it or not today we will load a pre-trained neural network and we will use it to make predictions on data it has never seen before specifically a picture of my cat we will also run a speed test comparing pytorch models running on cpu on cuda as well as on torch tensor rt and yeah we will learn how to work with all of them so are you ready let's roll and we will begin by cloning torch tensor rt to do this we will type inside the terminal git clone https github.com slash nvidia slash torch tensor rt which will copy this official github repository and store it on our computer let's go ahead and hit enter and once our download is complete we can then go ahead and navigate to a brand new folder which was created on our computer we will do this with cd change directory and then torch dash tensor rt and here we will do something a bit different we will actually run our code in a docker container which is quite similar to an anaconda working environment but one difference is the docker containers include all the libraries apis and other dependencies to run a specific application so we are basically using an isolated working environment which the developers of the software have already created for us so in this tutorial we are accessing it rather than creating it from scratch and installing things inside it so let's go ahead and install docker and we will actually follow the installation guide i've included in the description so just navigate there and let's do it step by step so let's scroll down and we will copy this setting up docker command which consists of two lines of code so let's press on this copy icon and let's paste it inside our terminal now on your end you guys will be prompted for your password i already entered mine off camera so it's not going to ask me again cool now back inside our installation guide we will follow the post installation actions and the first thing we'll do is we will create a docker group oh it's a bit too big we will copy the very first command and we will paste it in our terminal now you can already see i have everything installed that's why my docker group already exists but on your end you'll obviously see a different message so once we've done that we can then go ahead and add our user to this docker group so let's copy the second command and this will basically grant us root privileges so let's press enter and back in the installation guide we will then update these changes we just made with the next command so new group docker let's go ahead and run it and if everything worked now we can type docker run hello world and we can do this without adding pseudo in front of it now usually sudo is what grants us root privileges but we already took care of it so we don't need it anymore let's go ahead and press enter and see if it worked let's scroll up and yep hello from docker this message shows that your installation appears to be working correctly perfect now let's move on next we will install the nvidia container toolkit and once again we will just copy the appropriate command from the installation guide and we will paste it inside our terminal additionally we will also need the nvidia docker 2 package and to do this we will first update our app package with sudo apt-get update and now we can use it to install nvidia docker 2. so let's type sudo apt-get install dash y nvidia docker 2 and we will also need to restart docker so let's do this with sudo system ctl as in system control and then restart docker cool now let's check if everything worked so back inside our installation guide we will copy the base cuda container command which you can find here so let's just press on copy let's paste it and if you are presented with information about your gpu everything is perfect and everything worked now on my end i zoomed in quite a bit but on your end you should see something along these lines next it's time to access a torch tensor rt container and i will actually show you two different methods in which you can do so where the first one is docker build dash t torch underscore tensor rt dash f dot slash docker slash docker file with a capital d and then space and dot let's go ahead and run this command you can actually copy it from the description and on your end it will take much longer to run but because i already built this image before i don't have to wait all this time and once we build this docker image we can then access it whenever we'd like so all the commands we've used so far they were just set up we will not need to repeat them but the next few commands are very important because we will need to repeat them every time we'd like to access our docker container now building a container is simply not enough we also need to run it and this command is going to be quite long so instead of typing it we will copy it from the description of the video i already done so let's go ahead and paste it and let's give it a run and perfect once you see this workspace hashtag message that means you are inside the docker and we can now go ahead and access jupyter notebook and first we will navigate to the folder where all the notebooks are saved we will do this with cd torch underscore tensor rt slash notebooks and from here we can run jupyter notebook with jupyter notebook dash dash allow root dash dash ip 0.0.0.0 dash dash port and we will go for 888 which is super common let's press enter perfect and we can now open this url and we will replace hostname with localhost very important and now we see there are a bunch of really nice tutorial notebooks here and we'll just ignore them at this point of time and we will create a brand new python 3 notebook instead we will call it and before we move on with the coding i actually want to show you another method in which you can access a torch tensor rt container so if you run into any issues we can always try the nvidia ngc way now the link is of course in the description and this follows an nvidia docker 2 installation so once we have it installed we can press on tags and we will select a version from this list in my case i'm going to go for the december 2021 version let's press on this three dot icon and on pool tag now we can paste it inside our terminal and let's give it a run perfect now back inside our installation guide we will navigate to overview we will scroll below and we will select whichever command matches our docker version let's go for the first one in my case and don't run it just yet because we will need to modify it a bit let's paste it and we will replace x x dot x x with 21.12 or whichever version you have selected on your end we will also add a slash in front of container dear otherwise we might get an error and lastly we will add dash dash net equals host at the very beginning otherwise our jupyter notebook might not run properly okay let's give it a run perfect and we are inside our workspace now to run jupyter notebook we will use the exact same command so i'll just uh fast work fast forward it we will of course open this link and we will once again replace hostname with local host and perfect we are now accessing jupyter notebook from a different docker now back inside our inference notebook we will begin with the imports and we will first import torch as in pi torch then from torch vision we will import models as well as transforms let's give it a run now the model we will use today is called resnet50 it is a state-of-the-art artificial neural network and its main purpose is to classify images now if you guys are not sure what neural networks are i actually have a very handy tutorial explaining them in detail you can obviously find the link in the description and only then come back here so resnet was already trained on a huge database called imagenet and it can classify 1000 different categories of animals objects body parts and all kinds of stuff now the best part is we are more than welcome to borrow it for our needs so let's go ahead and load it with models dot resnet 50 and inside a set of round brackets we will specify pre-trained equals true because we are interested in the trained version that's why we do it let's go ahead and assign it to model and actually let's print it right below let's give it a run and if you were curious how neural networks look like this is an example of an extremely complex one so that's state of the art for you and now that we have a neural network we will also need an image that it has never seen before now in my case i'm gonna use a picture from my personal gallery and it's just my cat sitting in a pile of flower looking extremely tough like like he's from the mafia or something so back in jupiter's file system we will create a brand new folder we will of course rename it and we will call it interference now let's go ahead and navigate there and we will press on upload here we can select any image we'd like from our computer in my case image1.jpg let's go ahead and upload it and perfect we can now go back to our notebook where we will use the pillow module to load this image so in the very last cell we will type from pill as in pillow import image with a capital i then right below we will type image dot open and inside a set of round brackets we will specify the location of this image in my case dot slash interference slash image 1.jpg let's go ahead and assign it to img as an image and let's just to make sure everything looks as expected we will also print it now let's run this cell with shift enter and there you go here's our image but that's not all in order to use this image for prediction we will also need to apply a few transformations to it now if you're not sure what transformations are or why we need them definitely check out my machine learning database tutorial where i explain it in great detail so let's start with transforms dot compose with a capital c to which we will pass a list of very important transformations where the first one is transforms dot resize with a capital r because our image is way too big for resnet so let's set it to a much smaller size of 205 pixels both for the width and both for the height additionally we would also like to slightly crop our image so let's type transforms dot center crop in camel case where the final image size we pass to resnet is expected to be 224 pixels all the way around so if you choose not to crop your image please make sure you resize it to 224 instead of 256 pixels okay next we would like to add transforms dot to tensor in camel case as well now this command will convert our image into a multi-dimensional data structure called tensor i will expand on it in future tutorials but for now just imagine it as a special data structure for machine learning and then lastly resnet expects a normalized input which is basically reducing the range of values without affecting their actual ratios so for example we can normalize values between 0 and 100 to values between 0 and 1 where 88 becomes 0.88 and so on now resonance expects some very specific normalization parameters so let's just go ahead and copy them from the documentation i'm going to include this link in the description so on your end you guys can just scroll down and once you find the first transforms.normalize command let's copy it and let's paste it back inside our notebook cool now i'm super happy with our transformations so let's assign them to a variable called transform and then we will apply these transformations on our image so let's type transform and inside a set of round brackets we will specify img which is our image and actually let's reassign it to img so img equals transform img and just to make sure that everything worked we will also print the size of this image so let's type image dot shape let's give it a run perfect so we can see that we have three color dimensions one for red one for green and one for blue and we also see that our image is now 224 pixels wide and 224 pixels high which is almost perfect the only thing we are missing here is something called a batch size now whenever we train a neural network we expose it to an enormous amount of images but if we try to feed all these images at once we will probably run out of memory it's just way too much data to handle at one go that's why we separate our data into batches and then we can load it bit by bit so it's very common to use a batch size of 32 sometimes you will see a batch size of 64 but the key note here is that the first dimension of our images or 10 source i should say must store the batch size so for example if we try to feed our current image into resnet it would think that we have three images per batch and each of our images has 224 color channels and so on which will obviously return an error so the solution here is to add a brand new dimension storing our batch size and since we are only dealing with a single example our batch size will be one we can do this with torch dot and squeeze to which we will pass our image alongside the position of the new dimension we are adding in our case that would be zero because batch size comes first now let's go ahead and assign this expression to image underscore batch and once we print its shape we see that we have now added a brand new dimension for our batch size perfect now let's make a prediction and in order to do so we will need to set our model to evaluation mode with model dot eval next we will disable the gradients as in gradient descent which we have covered not too long ago in a special tutorial definitely check it out for more detail now the reason we do this is because we only need gradients for training we do not need them for interference and we don't want them to consume memory if it's unnecessary so let's go ahead and turn them off we will type with torch dot no underscore grad and a set of round brackets and then inside the switch statement we can pass our image into our model so let's type output equals model and in a set of round brackets we will specify image underscore batch cool now outside the switch statement we will process the outputs and convert them into probabilities so essentially we will get a very long list of all the classes our model is familiar with and the class with the highest probability is exactly what we're looking for now in order to do this we will need to pass our outputs through something called a softmax function we can do this with torch dot nn dot functional dot softmax to which we will pass our outputs at index 0 and as a second argument we will specify dim as in dimension equals zero cool now let's go ahead and assign this expression to props as in probabilities but a funny thing happens if we try to print these probabilities what exactly does it mean so let's go ahead and convert this into human language and only then we can understand the meaning so as a preparation step we will need to get a list of all the class names that imagenet has to offer and the reason why we need it is because our model is only familiar with the numeric class values well what we're actually looking for is their meaning in english so for example the name of the class at index 1 is goldfish while the name of the class at index 2 is great white shark and so on and as you guys can probably see we can find this category name to category number mapping on github so let's just follow the link i've provided in the description and let's press on roll we can now copy this url from above and we will combine it inside our notebook so the first thing we'll do is we will import pandas spd and we will then use it to read this url we just copied so we will type pd dot read underscore csv and inside a set of round brackets we will pass the url we've copied in a string now as an additional argument we will also add header equals none and let's go ahead and assign it to categories and of course let's print it in the line below now if you don't have pandas installed we can simply do this with an exclamation mark pip install pandas so let's run this cell just in case i already have mine installed but i don't think you guys do now let's go ahead and run the next cell perfect and we see that all the class names live in the column of zero so let's just focus on it we will specify categories in the column of zero let's rerun it perfect and now if we specify the index of one we should get a goldfish hopefully so let's run it perfect that's a goldfish let's see index two great white shark perfect now let's move on okay we're almost done with inference so now let's display the top five classes of our image and hopefully they're all cats so we will begin by creating a variable called top k which we will assign to 5 then we will use the torch dot top k method to extract the 5 highest probabilities along with the class number they represent to do this we will pass probes as a first argument and top k as a second argument which is basically 5. now since this expression returns both the probabilities along with the class numbers we will unpack it into two variables instead of just one so the first one we will call it prob because probs is already taken and the second one we will call it class number cool now let's go ahead and print each of these probabilities with i in range top k and the first thing we'll do is we will create a local variable called probability and we will assign it to prob in the index of i dot item which returns each of our probabilities one at a time next we will create another local variable this time we will call it class underscore name and we'll assign it to categories in the index of zero which returns all our class names but we will also need to specify the particular class number we're looking for which is given to us by class number in the index of i and actually i have a feeling that this returns a tensor so let's go ahead and convert it into an integer because whenever we specify an index it must be an integer tensors are a big no-no here and then we can go ahead and create a very fancy print statement so we will type print and we will open a set of quotes where we will specify placeholder1 with a set of curly brackets then space and then another placeholder then we will add a dot format method to which we will pass probability as our first placeholder and class name as our second placeholder without the typos and actually let's return it in a percentage form so instead of 0.88 i want to see 88 to do this we will add a percentage symbol right in front of our first placeholder and we will then multiply probability by 100 and let's round it so we'll just convert it into an integer it's a very fast way of rounding something cool now let's run this cell hopefully i didn't have any typos aha okay so it appears that my 100 albertan cat is actually egyptian amazing i need to check if it responds to habibi cool so we see that tabby has 19 which is actually another type of cat i believe double check it on google please we also have a tiger cat which is perfect because my cat is in the size of a tiger we also have a siamese cat and cartoon what isn't it a box or something but that's okay because we only have three percents here and i guess because cats like boxes we can sometimes find them in boxes cool so we officially know how inference works and if we're only classifying a single image this technique running on cpu is gonna do the trick but what if our program is classifying an entire collection of images can we really get away without cuda or torch tensor rt let's find out for this we will use a special benchmarking function which you can find in the tutorial notebooks we've mentioned earlier so let's go ahead and open the resnet 50 example which is basically an advanced version of my tutorial you'll see that many of the commands here are completely different and then it's slightly more complicated but this one is dealing with four different images for prediction rather than one so it's definitely worth a check now what we're looking for here is the benchmark utility function so let's just go ahead and keep scrolling and once we see this benchmark utility header we can then go ahead and copy the entire cell below it cell number nine and we will of course adjust this function for our needs and i will explain it in great detail so let's go ahead and paste it in our notebook and let's have a look so the name of our speedtest function is benchmark and the only argument it actually requires is the model itself the rest of the arguments or parameters i should say they have default values so even if we don't provide anything in their regard during the function call our function knows that it needs to use these values instead so let's go ahead and adjust them to something a bit more realistic so instead of 1024 images per batch we will reduce it to 32 images per batch which is much more conventional and then we also notice that this function is dealing with grayscale images because they only have a single color channel while our images are colorful rgb images with three color channels so let's adjust one to three and the last parameter i would like to uh slightly adjust here is the n runs so let's reduce it to 100 because we're looking for a quick speed test this video is long enough already and then inside the body of the function the first thing we do is we generate some random values for our input and we do this based on the input shape we have selected earlier so essentially we are generating 32 dummy images and we use them for our speed test then we go ahead and send our dummy images to cuda which is something we will need to adjust because we also want to benchmark our cpu so let's do something very clever about it let's add an additional parameter which we will call device and we will give it a default value of cuda then instead of sending our input data to cuda we will send it to device and then in the lines below we actually have two lines that are not really relevant for our example so let's just get rid of them and now the fun part begins so before we even start timing our speed test we need to warm up our device so we are basically running our input data through our model 50 different times before our speed test and the reason why it's 50 is because in ra and warm up equals 250 by default okay and after we done we are done with warming up our device we will call the synchronize method and the reason why we do this is because our cpu doesn't just stop and wait for the gpu to finish its calculations it keeps running and keeps processing the rest of our code so with this method we are asking our cpu to stop and wait and only after our gpu calculations are done we can move on with the actual timing so given the number of end runs we have selected above our input data will run through our model 100 different times during which we will measure the average time it takes for our model to make a prediction and that's pretty much all we have here in this function i can break it down even further but i think it's it's enough information so let's give this cell a run and then we can use this function on our model and we will check exactly how fast it is so in the cell below we will type benchmark to which we will pass our model along with device equals cpu at least at first so are you ready let's give it a run let's find out exactly how fast our model is ah thank you [Music] okay so the average batch time is 661 milliseconds which i don't know we don't have anything to compare it to yet so we don't know if it's a lot we don't know if it's a little so let's go ahead and send our model to cuda and then we will measure again to do this we will type model equals model dot 2 and inside a set of round brackets we will specify cuda and then right below we will type benchmark and this time it will only specify our model because the device defaults to cuda yeah let's give it a round and see the difference shift enter oh okay that's faster wow you guys so the average batch time is 23 milliseconds let's see exactly let's see exactly the ratio so we have um 661 divided by 23. wow okay so this is almost 20 29 times faster holy smokes okay so it doesn't make much sense to use the cpu whenever we're predicting an entire collection of images okay at least 32 of them we know for sure so let's quickly adjust our model our cat predicting model to run on cuda instead so let's scroll up we will need to do two little adjustments where the first one is sending our model to cuda and the second one is sending our data to cuda so whenever we are loading our resnet model we will simply add a two cuda method we will rerun this cell we'll scroll below and then insert after our unsqueeze command we will also specify to cuda and this will send our data to cuda as well so let's rerun this cell let's also rerun our prediction cell and then lastly our result cell cool so you see that the results are exactly the same but the only difference is we are now getting them 29 times faster cool so that was cuda but how about torch tensor rt we started with installing it and we talked about it so much but we haven't had a chance to experiment with it let's do it now now torch tensor rt expects a traced version of our model which is basically a recording of how the model operates we give it some example input and we trace it as it passes from function to function to trace our model we will type traced underscore model equals torch dot jit which i believe stands for just in time dot trace to which we will pass two arguments where the first one is the model itself and the second one is the example data we would like to provide which in our case is a list with torch dot rent and as in random number and we will select the exact same shape as before so 32 images per batch three color channels per image as well as 224 pixels both for the width and both for the height and then lastly we will send this data to cuda because if you guys remember torch tensor rt is built upon cuda and that's why we do it now let's go ahead and run this cell with shift enter and perfect now we can convert our trace model to a torch tensor rt model so in the cell below we will type import torch underscore tensor rt of course and we will create a brand new variable called trt underscore model we can then set it to torch underscore tensor rt dot compile and we will pass three arguments here where the first argument would be our traced model and the second argument would be actually i think my head is about to block the rest of the code so let's just add a few more cells below sorry guys and the second argument would be inputs which in our case will equal a list with torch underscore tensor rt dot input with a capital i and we will select once again the exact same shape of 32 images per batch three color channels 224 pixels all the way around and additionally we will also specify a second argument of d type as in data type and we will set it to torch dot float 32 which is a floating point number with 32 bits that's the official translation now let's go ahead and add our last argument which would be enabled underscore precisions and we will set it to a dictionary with torch dot float 32 once again and cool i believe we can give it a run let's do it awesome so it looks like we have successfully converted our model into a torch tensor rt model despite all these warnings now let's go ahead and check how fast it gets so in the cell below we will type benchmark to which we will pass our trt model and once again we don't need to specify a device because it defaults to cuda let's give it a run let's do it and oh wow [Laughter] we are getting 13 milliseconds holy smokes you guys so yeah it looks like when we run interference on torch tensor rt it's twice faster than running inference on a pi torch model running on cuda holy smokes okay well we definitely have a winner now let's adjust our cat inference for the very last time and we will get it to run on torch tensor rt instead of just cuda so let's scroll above and what we will do is we will copy this prediction cell we will combine it with our result cell and now let's copy everything and of course let's revert it back just so we don't get confused and then at the very very bottom of our code we will paste what we just copied and everywhere you see model we will change it to trt model and yeah i believe that's all we needed to change here so let's quickly run this cell and check if it worked awesome yeah so we're getting the exact same results so yeah our model is now running on torch tensor rt twice faster than pi torch with cuda and more than 50 times faster than pytorch with cpu so good job you guys and as a last step we will also need to download this notebook so we will press on file download as and we will go for the ipynb notebook and we will save it on our computer now the reason why we do this is as soon as we collapse this docker our notebook disappears as well i mean holy smokes you guys i've just noticed something really cool so it looks like we've passed an image with a batch size of one to a model with a batch size of 32 which is a big no-no we should never do this and even though we got this error over here it looks like our output was printed before this error was triggered which is incredible you guys so whenever you design a model please make sure that the batch size you specify in your model is a hundred percent match to the batch size of your sample data okay it has to be because this shouldn't be happening i don't know how we got away with it this is nothing but magic congratulations now you know exactly how to load neural networks and how to use them for interference and the best part is you've learned about the most advanced and most efficient tools that only experts were using so far so if you're just starting your machine learning journey you need to be extremely proud of yourself because this is beyond impressive now in the next few tutorials we will focus on some more machine learning benchmarks and we will see exactly how opencl compares to cuda now thank you guys so much for watching i really hope you enjoyed this video and if you did please give it a like if you have anything to say please leave me a comment if you'd like to be extra awesome please subscribe to my channel and turn on the notification bell also please share this tutorial with everybody everyone you know don't skip even one person now thank you guys so much for watching once again i'll see you very soon

Info

Channel: Python Simplified

Views: 32,144

Rating: undefined out of 5

Keywords: trt, torch tensor rt, torch tensorrt, tensorrt, torch_tensorrt, pytorch, cuda

Id: iFADsRDJhDM

Channel Id: undefined

Length: 36min 25sec (2185 seconds)

Published: Sun Feb 20 2022