Nvidia's Free RAG Chatbot supports documents and youtube videos (Zero Coding - Chat With RTX)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello again in this video I want to show you how to install this chatbot with zero coding and then start using it in three different ways you can either chat with your documents so it has a buil-in rack system or chat with YouTube videos or use it as an AI chatbot something like chat GPT so last week Nvidia announced a very interesting project called chat Feit RTX and if you check the website you will see that you can download the chatbot directly from here and this chatboard is only for users with access to Series 30 or 40 so you need at least 8 gab of vram and also they are saying that you don't need to have access to the internet so how would that work so as soon as you hit download now you will see that the size of this chatbot is around 35 GB which means that you are downloading the large language models locally on your PC so that is why you don't need to have access to the internet and when you download the chatbot and install it you're going to get this user interface so I want to show you what are the options that you have with this chatbot but before that let me quickly explain how you can actually set it up the first step is to just simply download the chatbot using the download button right on the top of this website and as soon as you extract it you can run the installation and install it on your PC so throughout the installation the process is very simple as well the first step is you just need to agree to the terms and conditions in the Second Step you need to select the models that you want to use with this chatbot so there are three different options that you can select chat with RTX Lama to 13 billion and Mis roll 7 billion in The Next Step you set the path wherever you want to install the chatbot and then you just simply install it so during the installation it is going to install miniconda for you Cuda tool kit and the installation itself takes something around 20 to 30 minutes depending on your systems specifications and finally after you installed it you can just simply close and run the chat bot so when you run it this is the first page that you're going to see in the chat bot as you can see on the left I have the option to select the AI model and I have access to two models misal 7 billion and llama 2 13 billion and to explain this INF for without going into the details it means that that Nvidia somehow made it easier for us to load these two large language models without the need to have access to an extremely powerful GPU so we are loading a quantized version of these large language models and then on the right we have access to three different features of this chatbot the first feature is folder path so we can pass a folder which contains all our documents the second option is YouTube url we can also pass YouTube URLs and start chatting with them or we can simply treat this chatbot as something like chat GPT so let's test them one by one and see how good this chatbot is so let's go with mral 7 billion and say hello to it hello how can I assist you today so let's ask a more technical question for example explain how to F tune a large language Model fine-tuning A large language model involves a adjusting the models parameters to better fit a specific task or domain first data collection two data processing model selection fine tuning evaluation hyperparameter tuning and deployment well that is a very decent answer in my opinion it didn't go into the details it is also giving us some examples for how to select our large language model which is nice I take it as a very good answer and by the way the speed of the answer was very surprising to me the speed is so good so I'm using an RTX 1390 and I have access to 24 GB of vram on my system and for me at least the speed is very nice now let's also test the memory here so if I ask what was my previous prompt I apologize but I do not have access to your previous prompt so I would have been surprised if it could actually give me an answer and there is a simple reason behind that we are dealing with a 7 billion parameter model and we definitely have limitations with regard to the context length that we can pass to this model adding memory means adding text to the input of the model which means that the model needs to deal with a longer context length and that would definitely confuse a 7 billion parameter model so so far I'm very happy with the way that it is giving me the answer the answers look really nice and also the speed is so good and we just saw that it doesn't have a memory so before switching and testing the 13 billion parameter model let me also show you how much GPU memory this model is using so as you can see here this model is using 7.9 GB of GPU memory so that is why on the website they mention that you need at least 8 GB of GPU memory because if you have it then you at least can run the 7 billion parameter model now let's switch to the 13 billion parameter model and see how that works so the model is ready and with this one I want to test two things first I want to pass the same question that I passed to our 7 billion parameter model to check the speed and also see how this one perform and also I want to test hallucination with this one so let's say so if I ask the same question it gives me the answer sure I would be happy to help fine-tuning a large language model involves adjusting the model's hyper parameters and ways to improve it its performance again it is correct and as you can see this model gave us a longer and more detailed answer so that is interesting we are using a 13 billion parameter model and clearly this model is more powerful than the 7 billion parameter model but now let's add ask a second question and see if this model hallucinates so as of today there is no GPT 5 yet out there and this model definitely does not have the knowledge about GPT 5 so if I ask so tell me about GPT 5 sure I'd happy to help GPT 5 is a language model designed by meta AI that is designed to generate humanik text based that is not correct keep it in in mind that just like any large language model out there such as GPT 3.5 and GPT 4 these two models are also prone to hallucination so now let's test the rag side of this chatbot so as I said there are three options here and the first option is folder path so if you select this option you will see that there are three type of documents that are supported txd PDF and do Doc in this folder I have multiple documents and I've already selected it and as soon as I start asking a question about those documents this chatbot should be able to give me an answer and I'm using the same documents that I used in a previous video called llama index versus Lang chain so if you have already watched the video you know that there's a document with three stories on it and each story uh is about a separate character so let's see if our chatbot knows those characters right now so if I ask who is Fred it says based on the given context information there is no direct mention of a person named Fred which is wrong however there is a character named Fred the redf Fish in the story so actually there was a character named Fred the redfish and that is the right answer but in the beginning the model was a little bit confused so let's ask another question from a different document one of the other documents is the paper for vision Transformer so I want to just just ask what is Vision Transformer based on the given context information the vision Transformer is a direct application of Transformers to image recognition so the chatbot somehow was able to see some content from the paper and my question was a very general question so any part of the paper could probably be retrieved and this is the response that it is giving us and one thing that you have probably noticed is that the chatbot also adds a reference file at the end that is a nice touch from Nvidia that I liked from the buil-in rack side of this chatbot so if you click on any of these references it is going to actually open the reference with whatever PDF reader that you use so this is the story PDF that I have in that folder and if I click on this one it opens the paper for me so again in that folder I have a document which is about a company that I created for another video uh for fine-tuning large language models so it is a document that I try to replicate the customer support of a company and in that document there is a question that I want to ask our chatbot how do I perform a factory reset on my Cube triangle Alpha smartphone so this is the PDF file for that question and this is the question to perform a factory reset go to settings system reset options and erase all data go to setting scroll down and select system choose reset options select raise all data so that is the right answer it also added a scroll down here which I didn't see yeah so we don't have the scroll down thing that it just added to the response and it just added that by itself so again be aware that Hallucination is part of this chatbot even though it is a very small change to the original answer usually when you have a rack system you want it to be as precise as possible but overall I'm very happy with the performance of this chatbot beside that the inference time is still very fast which I'm more than satisfied with the speed of this chatbot and if we check the amount of GPU memory that it requires you can see that it requires 12.1 gbt you definitely need a GPU with at least 14 gabyt of vram to be able to run the 13 billion parameter model without any issue all right so so we just saw the rack side of this chatbot with the 13 million parameter model I'm not going to test it with the 7 million parameter model uh in general the performance is very nice I want to test the last feature of the chatbot which is YouTube url so if you select YouTube url you can see that there is a number here which I checked and I think it is the number of URLs and content that you can expect the model to check and give you the answer but anyhow I want to pass two YouTube URLs to our chatbot and start asking questions from them so I prepared these two URLs which are my previous videos and if I start passing these URLs so the first thing is it is going to take that YouTube url extract the text from it and store it on your local PC so definitely for this one you need to have access to the internet and if you check here you can see that it is this storing the content for me so let me remove them and let's pass the URL again okay it just process the URL and if I open this folder here we see that it just added the content of that YouTube video so if I ask a question about this YouTube video for example what is the video about the video appears to be about designing and evaluating a chatbot using different techniques and evaluating the performance of the chatbot using various questions I mean it got the point but again that question was very general so let's pass the other YouTube video as well so if I check I will see that now I have two contents from two different URLs and if I start chatting with these two contents one question that I asked from this chatbot which I'm sure that this chatbot does not have the knowledge for that question is about the recent announcement of open AI they announced a model named Sora as you probably already know so if I ask what is Sora I expect this model to be able to give me the answer right now so Sora is a machine learning model that generates images and videos the video description mentions open AI Sora suggestion that it is a model developed by open AI however without more information about Sor so let's go further here are some possible questions and answer based on the context information what is the purpose of Sora based on the video description Sora is a machine learning model that can generate images and videos this is interesting so they also suggest q&as based on the context of that video and at the end we can see also some metadata about that video the title of the video channel and the upload date so that is a very nice touch from Nvidia and I should say that I'm more than satisfied with the performance of the chatbot right now I mean considering the fact that this is a 13 billion parameter model this is an amazing performance so for the last question I want to test and see if we have a memory with the 13 billion parameter model so if I ask what was my my previous prompt based on the given information I cannot access any previous prompts or context please provide a specific context or prompt you are referring to so it is giving me a reference file because regardless of what you ask a vector search is happening behind the scene and some content is being retrieved and in that case apparently Nvidia just uh didn't try try to distinguish uh when the model is able to give us the the answer versus when the model is not able to give us the answer and they just added the reference at the end I didn't expect this model to also have a memory because 13 billion is still not considered as a large language model as a big model and we still have the issue with the context length so before I wrap up the video I just want to show you a weird behavior that I saw from the chatbot so if you choose YouTube URL and if you start asking a random question for example in this case for the first question I'm asking what was my previous prompt so the model is not giving me a proper answer but instead it is just going through everything that it can find that question is not right because there was no previous prompt and definitely this is not the answer to it so yes when you select the YouTube url side the models might behave a little bit weird at least in a more unexpected way and just keep that in mind so considering the fact that this chatbot does not require any coding skills and you can install it within 30 minutes window time and this is what you get in the output this is amazing now that you know the pros and cons of this chatbot I definitely recommend you to test it and don't forget to let me know in the comment section what you think about this chatbot

Info

Channel: AI RoundTable

Views: 3,610

Rating: undefined out of 5

Keywords:

Id: 8iMIGVWMPPQ

Channel Id: undefined

Length: 16min 27sec (987 seconds)

Published: Wed Feb 21 2024