Use Open AI (ChatGPT) On Your Own Large Data -Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I have been trying chat GPT and open AI models to analyze my own data and you found them fascinating I found them absolutely useless what did he say because my own data set and documents are large I have so many texts and PDF files and these models have never seen my data so if I want to use these models for my own data I have to fit them all in one prompt and there's a token limit in the input of these models so I cannot use them for my own Auto you know my own use case I mean how can I fit the documents yeah but there are still some solutions for it oh I know the solution I'm going to actually create my own chat GPD model based on my own data and documents so I can use it for my own use cases but I need your credit card because tuning these models are you know what I mean the bank of mg no no no no no you're not going to do that definitely no tuning or listen I am you doing this because there's a way to do it actually listen there is a better way to do it without fine tuning and retraining the stuff you can still use these models with your own large document let me show you okay let's make a deal I can show you then oh let's go hello everyone this is FG and welcome to another video well we might have already heard a lot about chat GPT open AI models GPT model so on and so forth but there's a specific important question that you might have already asked from yourself that how can we use these great models for using our own data documents and files because let's say you're a researcher or you own a company and you have huge amount of text and PDF files that you want to ask specific questions or to gain some insight using these a open AI models but these models they have not necessarily seen your data within their training phase so the question is how can I provide my documents and data to this model to let them understand what I'm asking about and then come up with the answer these models have token limit in their input so you cannot fit all your documents in one model input or prompt then in this video I'm going to show you an accelerator that will help you to create war embeddings from all your documents and PDF files to be able to First search for the relevant text from the word embedded generated by opening our models and then be able to ask open AI models like chat gbt for the relevant insights or information you want from that prompt with the given relevant text that you gather from bodymatics I know I'm talking a lot but let's check it out it's going to be much clearer when we go and get Hands-On with this accelerator and see how it works on Azure so let's check it out before we start make sure you subscribe and hit the Bell icon so you will get notified for the next video all right hello everyone and welcome to another open AI video so we already talked about open Ai and Azure open AI why it is added to Azure what are the benefits how you can gain access to Azure open AI I provide you walk through of all these components in the studio so I'm gonna add the relevant opening my videos in the video description make sure you check that out and this video gonna start with the assumption that you already know what is open AI what are these models and what is azure open AI so the challenge that we're going to talk about and the solution of course for it is that here you can ask any specific question using gpt2 models or even chat GPT right but as you know they have token limits that means I cannot hear ad a text file or text Suite millions of words and pages and stuff there that gonna give me the error that hey your input is more than the limit of the model so now the question is if I have my own PDF files and documents and I'm going to ask a specific question that the answer is inside those documents and PDF files which are mine and the model has never seen them then how can I use these models with my own data some people might think that oh that's a great use case for fine tuning and tuning the model what's that's not the point because technically when you tune these models it's really for the time that you want to provide a new pattern or something that this large language model has never seen which is so rare so what's going to be the other solution there is actually a more simple way to do it what we can do we can convert our documents toward embeddings what does that mean let me give you an example I have actually some prompts that are actually generated the embeddings before so let me copy that and paste it here let's say I want to tell to text DaVinci model to convert the following text word embeddings how are you so let's see what it's going to give me as award inverting click on generate there you go so it is giving me some numbers but what does these this number mean so if you don't know what is word embedding I would say check that out and try to search War to deck that's a sort of technique that you can convert given words to numerical values and this is something that model and computer will understand how are you by itself is not understandable for computer and this model so on back end these words gonna get converted to Numbers if I say for example hey I'm fine thank you or any other sentence here the word embedding is going to be different so if two sentence they have a similar meaning let's say how are you or how are you feeling then the word embedding should be similar in aspect of their distance that's why when I convert my documents toward embeddings then using similarity measurements I can find the relevance or or similarity of a given sentence to the sentences that I have with my documents and files so I got actually copy that let's keep that word embedding related to how are you and we're gonna do the same thing for how are you feeling right which is a similar sentence let's click on generate there you go so I have did this before it gave me some sort of different numbers but the concept gonna show you is the same so I'm gonna also copy that as well which I did it before so Hi how are you or how are you feeling they should have similarities in aspect of the award embeddings that we're gonna check but let me also trade War embeddings for a totally different sentence let's say let me delete this actually and copy what I tried before let's copy this this the color is black or anything right it's not really related to how are you or how are you feeling okay so now this is the weddings for this sentence as well so three sentence we tried the first two were relevant their meaning was sort of close but the third one is totally irrelevant so what I'm gonna do here's another prompt that I tried actually let me copy and paste it here I say that I have three voting bearings the first one is how are you the second one is how are you feeling and the third one the color is black so the first two should be similar versus the third one which is not related to first two so I'm gonna say which two of two ones are more similar the first two one or maybe the third one which is not really relevant so if I'm gonna click on generate the first rewarding bits are more similar that means Hi how are you with how are you feeling they're more similar compared to the colors black which totally makes sense but how that similarity got measured it can let's say use cosine similarity with using dot products of these numbers now we were able to understand the relevant topics and sentences close to each other so now think about it if I'm gonna do the same thing convert all my PDF files and text files to avoid embeddings later when I ask a question let's say for example this is my question these are let's say all the questions that I have from my text what I can do I can grab this question generate voting biddings for that question and go to my database that I have word embeddings for all my documents check out which document in my word embeddings that I created them before is closer to the user question then add that relevant text to my prompt here and then answer the question so that means first we are retrieving the relevant text from all my texts and then ask the question so when you bring just the relevant text not all the documents so you're resolving that token limit so you don't need to bring all your documents in one one prompt you bring the relevant text that is relevant to the question that the user is asking to The Prompt how you can bring that you need to generate volume bearings for all documents and then which measure similarities and bring the relevant things to the problem so I wanted to explain what we're going to do but you're not going to do it all manually because there's a very nice accelerator and kudos to the people that contributed to this that will help you to do that and this is the architecture let me maximize it a little bit so what I'm gonna do let's say I have so many PDF files in my company that is relevant to my own um documents sorry my own use cases or domain and by the way I'm gonna add the link of this GitHub repositor the video description so we can give it a try and then what I'm gonna do this accelerator will give me a web application that I can run it in the UI and then my browser I will upload my company PDF files and documents that can be huge so this will automatically generate first of all text from those PDF files because PDF files are are not just a simple text you have to recognize the documents maybe there's a table you have there's a chart you have in those PDF files so Azure form recognizes a service that can detect information from those charts whatever you have from PDF files and even if those languages are not English you can maybe even translate those to English so creating raw text out of any unstructured document you put here PDF talks docs txt Etc whatever and then what I'm gonna do with Azure open AI service you can create warning meetings the numbers that I just show you from for all the text that you have here now you need certainly a place to store those word embeddings right radius is a great place to do so because it's pretty fast it can cache the data in the memory and you can retrieve the embeddings so I have boarding Basics as a vector in redis and later on when my user asked a question using chat GPT or open AI models relevant to my own documents in the company what I will do I will go back to my redis find out the closest word embedding to the question that the user asks and then I will check what is the text for that chosen word embedding then I figure out that chunk of text then I bring that chunk of text to The Prompt of the user and then let Chachi beat your FN AI models to answer for example here is a very specific question there is a red blinking light on C4 button mysp 300 machine which part do I need to replace this is not something related to the web data set this is the question from your documents and then the answer can be provided here because I was able to grab the relevant information from word embeddings let me show you an action that will certainly make much more sense so how you can run this web application well there are multiple ways you can run it on Azure this is what we're gonna do with these services or also you can run everything local in the docker check different ways out I didn't write these by myself but I just tried the first one which still playing on Azure and I'm gonna show you how to do it please be cautious that before you deploy this you need to First have open AI on Azure I already talked in the previous videos how you can get access to Azure open AI there is a request form you need to fill out and then in your Azure portal you will see Azure opening I created for you then if you want form recognizer you need to create form recognize it on Azure I'm going to show you how to do it and if you want it's optional if you want to have Azure translator in case those documentations are not English and you want to video English then you can have translator so I'm going to show you how I created those information but let me show you how to deploy an Azure first so I click on Deploy on azure it is asking me the account that I'm going to log in I already selected that I'm not going to create because I already did but just want to show you how you can do it let's wait a bit for to deployment get loaded and there you go it is asking me give me the subscription the resource Group you can create a new Resource Group if you want the region the resource prefix that's a name that will be used for all the resources that's going to get created I added for example mg and then it is asking me some container name for the redis some password you can change that if you want and for other names it is using already some prefixes you can keep them as is the stuff that you need to definitely change is first what is your open AI name well I'm gonna show you my open AI it is test mg I already created and I have got access to Azure open AI okay so I add test mg oh sorry let me go back to the deployment then open AI key if you remember the previous videos each Azure opener that you create has key if you click on key section here you can copy the key and the end point I think it just need key let me double check yes just put the key and that's it open your engine what engine you're going to use for answering Quest users questions you can use let's say chatgpt which is GPT 3.5 Turbo and of course you need to create volume beddings for your documentations right uh some open-air models can create a volume but it's not all of them so here text embedding other has been used again you can change it so form recognizer you need the endpoint of work organizer and the key how first you need to create it just search for recognizer there you go and you can create one I already created so if I click on that if you go to the key and endpoints you can copy and paste the key and endpoints that is showcasing the portal and paste them here exactly the same thing for translator just search translator in Azure portal create one and copy and paste the key there and of course the region that you want to deploy to that's it you click on review and create what gonna happen in that Resource Group you specified on the top it's gonna create bunch of resources for you so let me show you what you will see let me go to my Resource Group this is the one I created and there you go you can see that it created container instances storage Account app to launch that web application function app for running some codes given the Repository and let's launch the develop so I click on app service this is the prefix I use that's why the name is empty but yours can be different and then here I can launch my web app using the domain specified on the right side there you go now my web app is up and running and I'm gonna show you what I did previously with this so first of all you need to add your own document right so click on ADD documents these are the documents that the model has never seen but you can now drag and drop your PDF file jpeg whatever your form recognizer gonna generate that relevant text out of them or let's say this is not a file this is a large text that you have you can copy and paste it here and compute embeddings or even you can actually add documents in batch for example multiple files or multiple documents at the same time you want to load them this is the place to go so I already downloaded two actually articles one of them from Google Scholar that's a research article published thing just in 2023 if I'm not mistaken yes and this is something that the models has never seen and I want to ask specific questions about this research document so let's say I am a researcher I'm not going to read all this paper but I want to upload this paper to ask a specific question for example what was the features that has been used to train this model that's mentioned in this document right for example the future list of the hypertension prediction model this is actually what I use that give me that feature list without opening this document and it's gonna retrieve me back all the feature names we're gonna come back to this the second document I uploaded is I think Bank of use annual report for 2022 and I want to ask a specific question for example what is the responsible growth mentioned in this Annie report by Bank of America okay so and that's a large document by the way 226 PDF files uh sorry pages okay so let me go to my web app there you go so if I click on document viewer you can see that I have already uploaded these PDF files and they have been converted and if I go to index managing you can see it converted my documents to bunch of chunks chunk of text and here's the text and then the ID and the embeddings relevant to that text okay so let me go to the place that I can write down my query actually I think I have my query already so what I'm gonna do let me just copy the question that I had okay let's say this is the question that I have what are responsible growth where did I grab that question well in that any report I found this actually keyboard and I just want to see if it's going to retrieve me back this without me giving the text giving the PDF or whatever Okay so actually I think it already worked there you go what a responsible growth responsible growth number one all the way two for growing system manner okay let me check if that's relevant to that or not there you go number four gross is sustainable matters so it was able to retrieve that information from the PDF file without me specifying the text here but how was the magic here's the thing so this is the prompt please reply to the question using only the information present in the text Above This Is The Prompt that was used for asking for open air model but what is the text above that text was imported from war embeddings using similarity measurement so it was able to understand responsible growth for embeddings is close to that part of the text of that PDF file so it converted bring that all text information in the prompt and then ask my question and then the prompt got answered here so if I click on settings this is the problem was being used we can also change the response length temperature stuff we talked about already and even the language if you want to change it okay let's check out actually the other PDF file that's a research PayPal um I want to ask give me the list of features used for this prediction model in that research study right so this is the question let me go back to my web app paste that question and hit enter running here there you go let me check uh what were the feature lists for okay let me first of all check what was the prompt let me go back stop the recording from here so here's my web app I'm gonna pass that question what were the feature list of hypertension prediction model let's check what they're gonna answer it's running and there you go it gave me all those list of features used which were mentioned in that table which is a PDF file so form recognizer variable to text it to generate the text and that's the prompt the same thing it got the relevant text from that PDF file and added to the prompt and the prompt was able to finalize and get the answer back if you want to play around with this web application you can even do some attack analysis here's an example for summarization if you click it will give me the summary just making sure that web app can't call your opinion models that you specified you can extract specific data from that given conversation or just general um playground that you want to play with this data for the settings we already talked about that this is my uh open a resource with the model which is text DaVinci that I'm using and that's it so I just wanted to show you this is actually one example to do there are more ways that how we can deal with these models with your own large documents and check that out that depository that I'm going to add in the video description again you can add it in your own container not just necessary Azure but I hope now the idea of how you can use large documents with open air models make more sense for you you know the flow and here's a quick start to leverage it and that's all hope you enjoyed the video are you sometimes overthinking if yes let me tell you something there is no past and future everything is right now this moment that you're listening to me am I wrong pass is just bunch of memories in your head and future is projection of those memories in the future so imagine that you just got born to this life and to this world and you know nothing so is there something that you would be worried about then just relax breathe and as I always mentioning dream big believe in yourself and take action [Music] this video [Music]
Info
Channel: MG
Views: 46,113
Rating: undefined out of 5
Keywords: open ai, gpt 3 ai, gpt 3, azure ai, artificial intelligence, machine learning, openai chatbot, gpt 3 demo, gpt 3 fine tuning, open ai chat, openai gpt 3, gpt 3 prompt engineering, openai chatbot gpt writing code, openai chatbot demo, chat gpt explained technical, open ai chat gtp, Azure, Azure Open AI, ChatGPT in Azure, Open AI in Azure, Open AI in Azure Demo, Azure Open AI demo word embeddings, chat gpt, advanced tutorial chatgpt, chatgpt advanced guide
Id: eNKu307k59g
Channel Id: undefined
Length: 23min 32sec (1412 seconds)
Published: Tue Mar 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.