How To Integrate OpenAI With Azure Vector Search aka Azure Cognitive Search

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
yeah hello everyone in this video I will show you how can you integrate open AI with Azure Vector search database so we will start by creating an instance of Azure cognitive service instance we will go and grab openai key then we will read our document create chunks out of it and then we'll generate embeddings for those chunks and push it into our Vector search database so once it is in database then we will fire the queries and we'll get the context and the same contextual information we will be passing it to open AI completion endpoint so let's get started with this and I am not going to talk much about the theoretical portions of it because I have already covered few of those Concepts in my earlier videos so first of all I will show you how can you create an instance of a sure cognitive search service so this is my Azure portal and make sure that you have an active subscription and you are logged in with your credentials then you can go ahead and search for cognitive search okay click on create and here you need to furnish all these basic details which you must have already done if you have interrupted with any of the Azure service you need to select your subscription then here you need to provide the resource Group so if you want you can utilize the one which is already created otherwise feel free to go ahead and create the new one and then you need to provide the service name which is again has to be unique and then the location Which is closest to you or where your other instances are deployed if you are going to use the other Azure services with this particular solution and then comes the pricing TS so there are multiple present tiers you can choose whatever works for you for this demo purpose I am going with the basic one so once this all these details are populated you can click on review plus create in my case I have already created this instance so I'm not going to go ahead and create a new one rather I will show you how it looks so once your instance is created you will see a page something like this and under the indexes there is nothing which you can see under the indexes there is nothing which you can see because this is a freshly created instance so this is the one thing second thing is you need to go ahead and grab your openai key so if you're not sure how to do that you can quickly go on to platform.openai.com log in with your credentials and on the right hand side if you will click your on your name you will see view API Keys click on this particular option and this is where you can grab the key so if you have already made a note of your keys then you can go ahead and use the existing one but by any chance if you didn't remember your key then go ahead and create the new one using this particular button create new secret key so this is how we used to do setup and in this entire scenario we will be needing like I said three key things the first one is API key and then we need the key from this Azure instance and that I can show you from where can you get that key and the end points so you on the left hand side you can click on keys and here you can see these are the keys so grab the one so in my case I can show you why I just grab this primary admin key but you can choose based on your choice and second thing is you need to have the end point for this service so for grabbing the endpoint you can just click on the overview button here and you will see something like this so this is the URL you can see https channel so definitely it would be different for you based on what is your resource name but this is what we need to get started with our solution now moving on to the code the next thing we need is the input file so for this demo purpose I am taking a text file which is handbook text and this is the same file which I have grabbed it from the Azure documentation where they're talking about some fictitious company contoso and this is about the contoso electronics employee handbook this is not very huge but I would say it's a good file and you can see there are some special characters which I have not cleaned up but when you are working with real stamp scenarios or your actual data make sure to clean your data before using it so now you have your data ready you have your keys ready let's go ahead and get started with our code the first thing which we need is to import the required packages now before importing let me get into my virtual environment and so that I need not to install my packages again and again okay so my environment is activated and here are the few things so let me remove these things which we don't need and the first one we need is open AI package then we need OS because we are going to interact with the local file system here are the few pertaining to Azure so these are the ones which we need and inside models you can see that these are the various classes which we will be using so once these things are done I will go ahead and run this particular cell oops it has something is wrong here let me quickly have a look so it is not defined so let me run this particular cell because this is where I have defined all my keys now it is up to you whether you are taking it in an environment variable or you just want to set as a constant variable so I have done it in my this step now I have ran this particular cell it is successful next we will go ahead and create the vector configuration so for Vector configuration you need to Define your configuration name so this is the class which we are using to create the configuration and this is the name of my config you can definitely name whatever you want and as of now it supports uh hnsw kind and these are the various parameter now what these parameters are and how they are contributing to this Vector search there is a very good documentation I would suggest you to go through it but in nutshell I can say you that setting these parameters will make your search very very efficient so let me point you to the documentation so this is where we have details about the hnsw parameters and you can see that EF K as well as M and EF construction and the number of elements so these are all the parameters which we have used it over here so you can go ahead and read about those and you can also see that if we are setting these values very high it may impact our output so make sure that you are going with the optimum one okay so let me run this cell as well it is successful next we are going to now we have our Vector configuration ready we need to Define what all Fields should we have in our Vector right so in my case I am going to save a document ID because we need a column or a field which can uniquely identify your record so in my case I am taking it as a document ID and for Simplicity purpose I'm not taking much of these columns rather just going with these three and the content is the one which is in my text file and this is the embedding column it means this is the same embedding which is generated for this particular content so these are the three columns which I'm expecting in my index and here comes my index name which is handbook now when you are giving name to your index there are certain rules I would recommend you to go ahead and read the documentation but the one which I can remember is it will not take a capital letter in your index name similarly there are restricted special characters which you cannot use so just keep in mind that if it doesn't work then you are messing something with your index name now we have the fields ready we have the index name ready we have the client object created and see that in the client object we are passing the endpoint and the search key so this is the place where we are going to use these two credentials then we are saying okay go ahead and create an index for me in this last line using the create index function which will take three parameters the first one is your index name second is the fields which you are going to accommodate in your index and the third one is your vector search which is nothing but your configuration so once these things are ready we can go ahead and run the cell which is successful let's go ahead and chunk our documents so like I said my document is huge so we cannot pass it in one shot and that's the reason I'm using Lang chin here to split my entire text and this is again not a restriction that you should go with only this kind of split you can have your own mechanism to split or chunk your text but the ultimate goal is to get the chunks out of your entire text so rather than sending the single document in one shot we will be sending small small pieces and here I am creating a chunk size of thousand with an overlap of 50 because it's okay if I'm losing some information but still I would like to have like 50 as an overlap so that I don't miss it completely let me run this as well and same time I will show you how this okay so this is successful let's go ahead and run this one and here you can see these are the two documents which are created out of my input text which was a single file and here you have a page content and at the end you will have your document name let me show me show here so this is another property which is metadata and is holding my document name so if you want you can utilize these things but in my case I'm not very much interested into what my document name is because at this point I am having a single document but if you are having multiple text file or your multiple text files or if you're reading directory then in that case it's good to have this document name also part of your index okay so let's close this one and next thing what we are going to do is now we have our chunks ready we need to push those chunks into a vector database right and before moving on to that let me quickly show you the current state of our Vector database so let's go to indexes and here you can see that the handbook is created because we already run that cell and these are the fields which we created and right now there are no documents because we have not uploaded anything over here so that's the reason it is showing zero bytes so let's go ahead and create some documents so that we can push it on to into our Vector database so here what we are saying now we have these chunks ready and now we need to gra extract the information out of it and create a schema which is compatible with the configuration which we have defined in Vector database so according to the schema first we need a document ID so in my case I have just generated the guid or uuid whatever you call then I have the content which is nothing but the page content and then I am generating the Avenue so this is the function which I have already defined above and I'm just passing the content and it will go ahead and generate the embedding for that particular content then I am saving it into a Json file let me run this particular cell and I can show you again we end up with an error so this is a function which I have defined above I need to run this one okay and now it should work okay so let's have a look at the file which is generated and on the left hand side this is the file which got generated and you can see here that it is having a document ID on the first line then it is having the content and it is followed by the embedding so this is what we are expecting and this is what we are going to push it into our Vector database so now we have the Json ready with us next is we need to upload it so for uploading we have just few lines of code we are reading the Json file in this variable and we are using the search client which is having a function called upload document so this upload documents will take whatever there is present in our Json file so this is again a straightforward code and we are not doing anything much over here so let's go ahead and try out few queries so now I want to ask what information does this handbook has now here I'm just using the vector class which you will generate the embedding for the equity because we are comparing embedding against embedding right and in this case so you should have a embedding of your query which can be compared against the embeddings which are stored in the vector database so this is the line which is doing that thing for us so we are saying get generate the embedding and get me just the top two results out of it so once we have those top two results here we can go ahead and get this printed so result content is the one which is holding the result of those top two rows and next thing what we need to do is whatever the top two rows we are getting we need to Club those because that would be the context for our open AI API completion endpoint so that's the reason I am generating this input variable over here input text and here now the most simplest code in which we are calling the completion endpoint this is my model so definitely you can go ahead and use the model of your choice it could be chat so if you're going with the chat you need to define the rules and then do it so in order to reduce that overhead I just went ahead with turbo instruct and this is my question here so answer the question based on the given input and input is nothing but the text which we grabbed it from the content which we grabbed it from our Vector database and this is the query so let me run this one here and you can see that this handbook contains information about contusive Electronics including Mission values performance reviews all these things so this is how you can get the output and we have seen that how easy it is to read the text file generate embeddings out of it push those embeddings into Vector database and then extract the relevant content from the vector database and pass it on to the open AI so this is like I would say the initial flow or The Beginner's flow which can help you to get get started with these services but if you're working with your real the time examples or the samples and in that case you may need to tweak it a bit you may have to read the indexes or delete the indexes you may need to think of the mechanism how you can update your existing indexes so those are the things which I will be covering soon in my upcoming videos but let's keep this video simple so that we can understand how things talk with each other how components talk with each other to grab the required output so I hope you enjoyed watching this video and do let me know in comments uh what are other ways you are using to integrate with Vector databases thanks for watching
Info
Channel: Shweta Lodha
Views: 8,416
Rating: undefined out of 5
Keywords: Azure, Azure Cognitive Services, ChatGPT, OpenAI, How to integrate OpenAI with Azure vector search database, Getting started with Azure vector database, End to end solution with Azure vector db and openai, Azure OpenAI, Azure OpenAI with langchain and vector database, How to build chatbot, how to use openai playground, openai api tutorial, ChatGPT on Azure OpenAI service, Azure vector search, Azure cognitive search with openai, openai word embedding with Azure cognitive search
Id: iZPxYK1B4Ps
Channel Id: undefined
Length: 16min 32sec (992 seconds)
Published: Tue Oct 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.