NVIDIA NIM Is A Game Changer For Generative AI

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what is going on guys welcome back in this video today we're going to learn about Nvidia Nim and why it's a game changer for generative AI so let us get right into it not a it's all right so we're going to learn about Nvidia Nim in this video today and why it's a game Cher for generative AI now the plan for this video is to start with a brief theoretical explanation so to explain what Nim is how it works and why it's interesting I want to keep this as short as possible because I want to move on to the Practical part quickly and there we're going to explore different AI models in the cloud um from the Nvidia API catalog and then we're going to move on to a live demonstration where I show you how to use Nim how to deploy models easily with Nim and how to then use them so that's the plan for this video today now Nvidia is sponsoring this video so they provided me with access to the Launchpad so to uh an environment where I can use Nim on their Cloud uh servers using h100 so very powerful gpus and I want to start this video with a short theoretical explanation as I said I want to show you three or four slides just to give you an idea what Nim is about I want to keep this short so we can move on with the Practical part but I want to give you a brief explanation here so let's look at this this is the Dilemma that enterprises face nowadays they have the choice between manag generative AI services so basically using apis of companies or the do-it-yourself deployment both have pros and cons uh doing it yourself has the pro that you have flexibility that you have privacy and so on but you have to maintain everything uh and using apis has the uh Pros that you have ease of use and um everything is very uh very easy to set up you don't need to maintain anything but you have limited control you have privacy is issues you have uh limited infrastructure and so on so that is the Dilemma that is um present without Nim and Nim now has the solution to combine these two things to give you the best of both worlds um the basic idea is you have pre-built containers they're already optimized for the hardware on any system on any Nvidia accelerated system you have these optimized uh engines you have also industry standard API so you have the open AI API for llms for example the stability AI API for uh text to image generation and so on and you still have the customizability and you have professional support by Nvidia and so on so this is basically the all-in-one best of both world package that you can have here and you can see that doing it yourself requires one week plus um of deployment time and you can also pause the video and look at these uh bullet points here to compare but the basic message here the basic um info that you need to understand is that with Nvidia Nim you can easily deploy uh generative AI models within 5 minutes and you still have all the flexibility all the customizability all the optimization you don't need to maintain anything you don't need to uh invest any devops uh engineering time you can just easily deploy and use models uh that are customizable optimized and very very easy to use all right now let us move on to the Practical part and the first thing that I want to show you guys here is the Nvidia API catalog because this is where you can play around and experiment with models and usually before you deploy a model you want to play around with it you want to see how good it is and this is exactly what you can do here easily and for free in the Nvidia API catalog so on the left side here you can see we have different model families and different Industries where we have different models for these industries and here you can explore them you can have basic reasoning models like llama 3 for example we have Vision models so for example for image recognition or for uh OCR and so on we have visual design for example stable diffusion and other image generation models we have retrieval which is uh embedding models we have speech so for example here we have uh Text uh Speech to Text so basically taking speech and transcribing it into text we have biology stuff like protein folding and so on we have for gaming uh models that uh create realistic uh video game characters we have Healthcare and we have also industrial models for example weather forecasting here or optimization of processes and what you can do here easily in this API catalog is you can just click on the different models for example I can go here for llama 3 I can click on it and here on the left side I can now just chat with llama 3 I can say hello tell me something interesting about Nvidia and then I will get a response from llama 3 here in this uh in this web interface and I don't have to deploy anything I don't have to use any API key I can just play around with the model here to see what the quality of the responses is also on the right side you can see I get equivalent python code for this request and I can do the same thing now for all the different models we have here so for example I can use some image recognition model here and I can ask something like uh what kind of animal is present in this image and hopefully it will tell me that it's a dog and in this case it tells me it shows a dock and it also uh talks a little bit about the breed uh then for visual design we can go with something like stable diffusion we can provide a prompt and we can run it to generate the image so for example let's go with uh space uh and explosion or something like this to see what happens when we do that and the idea here is that you play around with the models before you decide which kind of model or what kind of model you want to use for your actual deployment you can do that with all the different model types for uh something like uh gaming for example we can go to audio to face and what we have here is we have an audio sample we can also adjust the emotional strength emotional control I can run this and it will generate an avatar uh talking now I'm not going to run all of this here I just want to show it to you you can do that for free you can go to the API catalog right now it's a link in the description down below you can play around with all these different models and once you say for example I want to use llama 3 you can go into the deployment which is done very easily using Nim all right so now I want to show you how we can take such a model and deploy it easily using Nim we're going to do that with Docker and then we're going to be able to use it with basic curl requests or with Python scripts depending on what you prefer and for this Nvidia has provided Ed me with access to the so-called Launchpad and here what I have to do is I just have to follow these very basic instructions and I can easily deploy the model so first step is I go to the model and I get an API key so I press this button here get API key I generate an API key I copy it then I go to uh the launch pad again and I follow the instructions the instructions are very simple I have to export an environment variable NGC API key so I'm going to paste this here I'm going to copy the command now I opened a file so I'm going to copy this command here I'm going to paste it into the terminal I'm going to clear and also remove this here so I don't have to censor it all the time um and now I have this environment variable with my API key set the next thing I need to do is I need to export another environment variable with the container name in this case we're going to go with the Llama 38 billion instruct model so I'm going to just export this as well I'm going to clear again and now the final step and then that's basically it I can just run this Docker command um this Docker command chooses the container name it uh uses gpus it provides the API key it defines a cash so that I don't have to always download uh the models again I can just use the cache models and then it provides the link here to the uh Nim container to the Llama 3 container so I can just copy this now I'm always going to Too Much Life with my mouse so let me just copy paste this run this and now you can see that this starts the process of deployment and you see I didn't do anything special here I basically just uh created I basically just ran three commands two exporting the API key and the container name and one actually running a container and that's basically all you need to do to deploy the model uh now once this is done this is running on Local Host now so I can go ahead and use this locally now of course this is now on the Nvidia uh Cloud because I'm using the Launchpad but in theory this would be your Nvidia accelerated environment and to check that this actually works I can do a simple curl I can use it here or I can do it here in the second terminal and you can see service is ready I get a response saying that the service is up and now I can basically just use um llama 3 easily so I have a curl CMD command here uh which is basically just a curl post request onto Local Host Port 8,000 completion and I'm asking tell me something interesting about Nvidia so this is just a basic curl request that I'm sending and all I have to do is I have to run this I can say curl CMD and as a result I will get some answer from the model you can see I get uh I mean let me scale down my camera a bit uh you can see I get the response that uh here's something interesting about Nvidia did you know that Nvidia uh was originally founded and so on so I get a response and now since I mentioned it we can ALS do the same thing with a python script so here I have the open AI package that is initialized with a base URL Local Host so we're not using any external API we're not using uh chat GPT or anything we're using the base URL Local Host so 000000 8,000 and I'm providing the API key only necessary if I'm outside of the NGC environment I'm right now in the NGC environment so I don't need to provide the API key otherwise I would just have to copy paste the API key here and then I just use the model meta Lama 38 billion instruct tell me something interesting about Nvidia and all I have to do now is I have to go into the terminal again and run Python 3 test py and I get a response easily uh from llama 3 and of course this can now be extended to any kind of application if you have any application let's say based on uh chat GPT let's say based on gbt 4 uh and you have all the structure ready or um you have all this uh code where you have an interactive chatbot or interacting with other sources of information you have all this built with the open AI package you can now easily go ahead and just switch it to Nim you can just uh deploy Nim using the docker command you can say that you want to now use not gp4 but your base URL Local Host and now you can easily just do that without having to change anything about your code except for maybe the model and the base URL so this is how easily you can deploy um generative AI models using Nvidia Nim now finally I want to show you an example of how we can take a python script based on gp4 using the open AI package to interact with gp4 and how we can easily turn it into something that can work with llama 3 on Nim very easily so I have prepared a script here on my desktop so I can go to my desktop and the script is called chat py it uses the open AI package it has an API key interacts with gp4 and it's basically a chat application so I can run this chat py and I can say hello tell me something about Nvidia then I will probably get a response that's a bit longer and once I have the answer I can again ask something like uh what is their stock ticker symbol and it still has the context and notes I'm talking about in video so this is the application that we want to take and turn into a llama 3 based application in Nim or with Nim so I'm going to go and copy the code I'm going to just copy it into my clipboard and then we're going to create a new file here I'm going to call it chat py as well and then we're going to see how easily we can turn this into something using llama 3 so I'm going to paste this and I'm going to open up the test py package just so that we can um adjust the settings here so let me uh remove this um now what do we need to do obviously first of all we need to replace the stuff in the open AI Constructor so I'm going to copy this and replace the API Key by saying we have the base URL Local Host and the API key is required only if executed uh outside NGC then we can change some stuff about the naming so we of course don't want to have gpt3 uh 4 anymore want to have llama 3 um and we want to do that in all the strings as well now this is just appearance ated so this is not anything functional we just want the stuff to be named properly since we're not using gp4 anymore um and then you can see that basically all we need to do is um we need to change the model so we basically need to copy the settings from here now the messages we're going to leave them the same we're going to say messages equals messages because we want to use the system prompt here and not the tell me something about Nvidia prompt but the rest can be copied except for maybe stream equals true because the code is based on not using streams so I'm going to actually just remove this I'm going to replace it by this we're going to say messages equals messages and we're going to set stream to false which I think is the default value but if I'm not mistaken that should be enough to be able to run this now on Nim or using Nim uh and llama 3 so I should be able to say Python 3 and then chat py I should say hello hello um can you tell me something interesting about Nvidia and then I get the Llama 3 response there you go what is their stock ticker symbol there you go so you can see I didn't have to do anything special I just had to replace obviously the URL we're not using the open AI API anymore we're using our own Nim container here our llama 3 container and I can basically leave the code unchanged I had to change the initialization settings and the completion settings but besides that the whole application is the same because it uses the same industry standard API so that's it for today's video I hope you enjoyed it and hope you learned something make sure to check out the links in the description down below you will find the link to the Nvidia API catalog and also some additional resources regarding Nvidia Nim and besides that I hope if you enjoyed this video leave a comment and the comment section down below hit the like button and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye

Info

Channel: NeuralNine

Views: 6,167

Rating: undefined out of 5

Keywords: nvidia, nim, NVIDIA, NIM, NVIDIA NIM, nvidia nim, genai, generative ai, nvidia ai, deploy ai models

Id: Yk1LI_n5KRs

Channel Id: undefined

Length: 15min 26sec (926 seconds)

Published: Sun Jul 07 2024