Azure OpenAI - Chat with Your Own Data

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to the fourth video in the series on Azure open AI service in this video we're going to learn how to work with add your data functionality this feature is still in preview and it's possible that it will change and improve in the coming weeks and months so when you actually work with this feature in the future it might look a little bit different but the overall idea will remain the same the changes that will happen to this feature I'll call them in separate videos now with that out of the way let's get started now let's first understand why we need a feature like this now we are working with a large language Model A large language model is trained from the data from the public internet so they do not know anything about the our private data but if you want to ask questions about our private data for example let's say you have a lot of documents and you want to ask questions you want to find information in those documents using this chatbot interface now that is where this feature comes in now let me show you how this works I can just click on add data source we have three options here we can just upload files from here itself or we can select a storage account or we can use a Azure cognitive search service where we have everything indexed for our first demo let's try this option here upload files and for this we need to select a blob storage account and an Azure cognitive search service let me create a blob storage account and Azure cognitive search resource now I'm going to use this script for doing that I'm creating a resource Group and a storage account and a search service and the search service it should be under this SKU let me just call that later all right I'm just going to run this script I have just created the storage account and search service and if I move on to Azure portal we have in this Resource Group we have those two resources the first demo that I'm going to show you is uploading files directly to this service let me click on add data source and we have these three options here I'm going to go with upload files first as you can see we need to select a storage account to store these files and we need to enable course as well to access those containers I'm just going to enable cross origin resource sharing as well and now we have to select a Azure search account and now let's understand why we are selecting a cognitive search resource here and we all know that we are working with large language models and there are two reasons why working with our own data with these large language models are not easy the first reason is it has been pre-trained the model already is pre-trained so it is limiting your ability to customize is it now you need a lot of compute power to change it so it's hard to add new data to it and the next thing is when you interact with these large language models they say token limitation now in this case of this deployment here we have 4 000 token limitation now to get around these limitations we have this pattern called retrieval augmented generation so the idea here is that let me just show you one document from AWS we have three entities we have the the orchestrator and we have the large language model and we have the relevant search information and when we need to query something this orchestration system it'll search for the relevant information and it will construct a prompt within the token limit it'll contain the prompt query and enhanced context and it passes that to the large language model and gets a response back and if you didn't have a limitation on the number of tokens that we are passing in we could have just used all our documents we could just pause all the documents into the large language portal and ask the question but we have a token limitation so we need to identify only the relevant documents only the relevant chunks of those documents to pass in and in addition to prompt and query this orchestration system should handle the context as well now for example if we ask who is your best friend and in the next question we are asking what is his age the next question is only contain what is his age so it should know this orchestration system should somehow manage the context as well so that's why we have the enhanced context I'm showing a document from AWS but this is a general pattern as you can see we have the same principles here as well the same Architectural Components are here as well we have the application we have the orchestrator and we have the large language model separately from the data and we want to query something we need to pass in the prompt and knowledge as well now this is the pattern that they have implemented here and that is why we are selecting a cognitive search resource and now if I scroll up this GitHub repository contains the kind of the custom version of what they have offered here so basically this will create all the resources in Azure and it will have the scripts to chunk the documents and and everything you can check check this out if you want now going back to the add data source functionality as you can see I need to select a cognitive search resource and I need to create an index as well now indexes are the way that Azure cognitive search manages your your own data so basically the idea here is that you can create an index so an index can have multiple fields and you can make the data searchable sortable and things like that I will cover them in a minute I'm just going to call this document index all right and now as you can see here I have to pay for this cognitive search account now if I click on this and scroll down as you can see we have a few pricing tiers free one would not work with this feature so you need to go with the basic one for the basic one you have to pay 73 dollars per month you should keep this in mind when working with this switcher so the bad thing is that you have to pay but the good thing is that you don't have to worry about the amount of data that you're uploading into this storage account to ask questions as you can see in my next page we can upload the files now let me upload one file I'm just gonna upload this one it is a company description of a software company as you can see it has some information about the company and the team and the the responsibilities of the members and the products and services that they offer and hourly rates and it gives me a summary and let me just save and close now as you can see this may take few minutes so let's wait around 5 to 10 minutes the process is now complete and we have this document search index created in Azure search let's just try asking a question here it understands the data we can ask any questions from this document now let's see now let me ask who is the project manager something like this yeah as you can see it understands the data actually it does understand the data to a really good level let's see uh let's see let's ask another question what technologies do you use for custom software development sometimes it hangs this is what I have observed so far let's just clear this chat and let's ask the same question again let's see how it behaves as you can see it it now works maybe because of this chat UI maybe if we publish this bot this won't happen so as you can see it correctly answers those questions as well now let me ask one more question let's ask something like this what are the early rates for web application development yeah it understands the document to well and now it's giving me all the rates of all the services that they're offering here and we can ask you to format as a table as well yeah as you can see it can format the data as well and it can present it so it understands the structure of it this feature works well except there was some glitches here and there and I believe that is because this switch is still in preview the next thing that I'm going to show you is that we can just easily publish this bot now let me just create a new web app using this button here and you can name the bot see something like this let's just select the subscription the resource Group and the location I'm gonna go with a basic plan so after completing after filling all this information I can just click on deploy here to deploy a bot all right as you can see the bot is getting deployed and this process will take like five to ten minutes after that we'll have an app service that hosts the bot alright as you can see the bot has been published now if I go into the the resource Group and we have the app service in place and we have the app service plan as well if I just directly go into the as you can see I am getting redirected to this this page here I can just log in to the bot this bot is not public since we're using our private data and now if I go into this authentication section of this web app the authentication that is being used is app service authentication we're using Microsoft as the identity provider for that you can customize this if you want now if you go into this board that we have deployed we can ask questions about the documents that we have uploaded let's see as you can see the bot works and we have upload the files and we have configured this it is working fine as you can see I mean that cognitive search resource and we have few options here now if I go into indexes as you can see we have one index the document index and we can do searches here now one thing to notice is that even though we have uploaded one file there are two chunks here so what Azure open AI does is that it seems like they are chunking the documents so basically doing their own optimizations and best practices the URL is the same but we have two chunks we have the the title we have the content now this is where the actual data is located and we have the file path and the URL as well so this information they're using in the board to show the references now if I go into the there is these references here as you can see we can see the reference information as well and since we have the direct link to the blob we can just display the exact PDF as well now what if we want to add more data to this bot so in that case we don't have a proper mechanism for doing that yet but in this demo here they have the scripts for doing this but you'd have to deploy this in a fully custom manner one thing that you can experiment is azure search service and this is the document Index right so we have this option here to add data sources and create an indexer now let me show you how to do that as well so I don't think this is the recommended approach for doing this and you will lose some of the actual benefits of chunking and things like that but this will work temporarily I can say document so basically first I have to configure the data source for this Azure cognitive search to index and then I have to select a connection string so this is the storage account that I have created and inside that we have the file upload document index when I uploaded the files manually and this is where this is the container that the file goes to as you can see now I have everything in place just gonna create this data source all right then what I can do is I can go into indexes and I can create an indexer let me show you how to do that I'm just gonna select the index that we want to update and then I'm going to select the data source and then I'm going to go ahead and go with the custom option here so basically we can schedule the indexing process I'm going to go with custom and I'm going to go with every five minutes this indexer runs and I'm going to keep all these configurations as default you can change them based on the document that you're uploading if it has images if you need PDF text rotation you can configure them here and if you want to exclude some of these files you can do that as well and then I'm going to save this indexer all right the next thing that I'm going to do is I'm just going to upload another file to this blob storage all right I'm in the blob storage account that I have created earlier and just going to go into this file upload index container and then I'm going to upload a new file here employee handbook so this is basically a new document and it is about another completely separate company let's see how this works all right I have uploaded the files now if I go into the indexer that we have created we have that in this list and I can just manually run this let me do that now all right let me click refresh here as you can see I have ran it again now if I ask this question okay yeah as you can see with updating the indexes using indexes we can add more data and let me just try to show you the uh the uh the documents here so as you can see when we upload these documents using indexes these metadata they won't be correctly updated and there are no chunks as well so this is not a proper approach for doing this maybe in the future when we do release the full version of this we'll have more control over this feature and in this video I wanted to show you how to work with add your own data functionality now if you have any comments or video suggestions please let me know down below I will see you with another video like this soon and thanks for watching foreign
Info
Channel: Meet Kamal Today - Cloud Mastery
Views: 24,470
Rating: undefined out of 5
Keywords: azure openai, azure openai service, azure openai add your data, chat with our data, chatgpt custom data, azure chatgpt chat, custom data chat, add your data, add your data azure openai, Azure openai tutorial, openai tutorial playlist, customize chatgpt, custom chatgpt, chatgpt data, chatgpt my data, personal data chatgpt, personal gpt
Id: LhX25V3cc2E
Channel Id: undefined
Length: 14min 40sec (880 seconds)
Published: Mon Jul 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.