Open AI Embeddings in Azure Vector Database of Cognitive Search

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
foreign [Music] I'm lost why I'm trying to figure out the best way to connect charge VT with my own data so I don't know if I should use Azure cognitive search for indexing my words and retrieval or I should use word embeddings and create a vector database I'm confused very confused well you can use a hybrid approach what well recently Azure cognitive search added a new capability which enables you to store your War embeddings as a vector database and index them using cognitive search that means now you can have maybe both semantic search and even board embedding bases to retrieve your knowledge and have it in your chat with your data scenario can you show it to me now sure can we pause recording I just burned my leg then [Music] let's go hello everyone this is mg and welcome to another video well we already talked a lot about how you can chat with your old data or connecting your Enterprise data to charge gbt for example you should chunk your data and and sort of figure out a way that you can index your data and bring the relevant information based on the question asked from your chatbot to let GPT models answer that question well I heard a lot of questions from you which were certainly valid which is how can I develop the the best efficient retrieval process shall I use Azure cognitive search for indexing my data so when someone asks a question cognitive search will bring the relevant sources to support the answer of that question or we have seen people use war embeddings that means you have to create a database a vector database you need to store your water embeddings there and then when someone asks a question based on the similarity of the embeddings of your question where is your data sources then we bring that relevant data and answer the question but which one is better which one is more accurate which one is more scalable there's really no sharp answer and a fixed answer to that question but there is a less risky and recent solution that we're going to talk about this video which is a hybrid approach you can actually have maybe both of them because recently Azure cognitive search added a new capability which means you can now use Azure cognitive search as your vector database so you can bring your board embeddings to cognitive search and save it there as a vector database and cognitive search can index it for you so if you have text videos images and different type of modality of your data you can have them all generated by embeddings and store them in Vector database of cognitive search and start retrieving them let's say for your chat with your data scenario no let's check it out that what are these the details of this using capability and what are the best practices and maybe see a quick start to to leverage that in action then let's check it out before we start make sure you subscribe and hit the Bell icon so you will get notified for the next video thank you all right welcome back everyone and let's get into the new capability I just added to Azure added to Azure cognitive search called Vector search well when we started to talk about how you can chat with your own data using openai models like GPT 0.5 gpt4 then we started to talk about we need to figure out a way to bring the source of data we need to ask our questions against them and let's show those sources of data in our prom to the models and because these models have token limit we have to show a specific part of data our data which is relevant to that question to answer that question by these models and then the main concern was how we can retrieve the relevant sources of data from my data sources to The Prompt based on the user question that we already talked about word embeddings we already talked about using cognitive search indexing to bring that relevant source of data for answering someone's question and I will add that to the top right of the screen here and also in the video description make sure you watch this video out before you come into this video because then you will have a better understanding of what is the proposed value of this new feature that we're going to talk about well then I got the question from you that shall I use Azure cognitive search for using semantic search and bring that relevant data I need for answering someone's question or I should create a vector database and generate Warrior embeddings and calculate the distance between volume buildings to bring that source of data to the prompt and start chatting with my data now today with this new capability you no longer need to create your own Vector database or comparative cognitive search because now Azure looking upset has everything it can do semantic search for bringing or retrieving that source of data for you what is semantic searches is actually I think using a deep learning model there you go a deep learning model on back in the live search to query your data and bring it to the prompt to answer your question and this is actually what Bing search use as well I mean semantic search but now beside this you can also have Vector search that means yes you can now use cognitive search like your vector database an index that Vector so the extent when you ask a question it will generate embeddings and compare them but of course it doesn't generate word embeddings by itself but that's why we do have Azure opinion models or some even open source models I can use them to generate embeddings and then the cognitive search to host them and I can query them Okay so before we dig in and show you an example what is the what is the reason that I should use Vector search and what are the scenarios that I can use first of all when we say Vector word embeddings are a vector but those embeddings can be generated not just for text you can have images to generate embeddings out of them that means if you have different type of data videos that can be converted to images or text you can generate embeddings out of them create vectors and save all of them in the same place which is now the vector database of cognitive search so next time when you query something or you want to retrieve some knowledge it shouldn't be just text it can be image and videos as well and how you generate embedding out of images that's a bit of different scenario for example here you can use some models are clicked for doing so or you can use also another cognitive service API on Azure I think it is under um Vision Services that you can generate even embeddings out of images too so the first benefit you can have search across different data types because all of them can be compared to embeddings as we discussed the one that we're going to focus on first is that we can also Now search our text based on their Vector embeddings instead of just doing a keyword search or just doing the semantic search if you have text with multiple languages Because by the end of the day they're going to be converted to vectors so we can query even our resources of data if they are even in different languages Because by the end of the vectors represent the context regardless of what is the language that's going to get carried by the model that they generate embedding for us and the one that should be the answer of your question that AMG which one I should use for retrieving knowledge I should go with semantic search cognitive search or vector database now you can have all of them support all of them you can get the results with all different approaches that cognitive search support that means using semantic search or War embedding based search and get the result from all of them and then do ranking and see which one is saying what so that means you can now have everything combined all together instead of deciding which one you should choose and they're much more I'm just I'm not going to read all these documents here just wanted to provide you walk through I will certainly add this linked document to the video description but just one more thing I gonna also explain further is that how we are detecting by how cognitive search is detecting which vector embeddings or what embeddings is closer to the question that I asked when I chat with my data let's say ask a question about topic a so that topic a will be converted to water embeddings and then that what embedding should be compared with the embeddings that I have with Azure cognitive search Vector database and give me the closest one but what do we mean we close this one here it has been fully explained that how a cognitive search is doing so but longer story short it is using K nearest neighbors so it will sort of map the vector space of the embeddings and with this algorithm it needs to specify which is actually using something called hierarchical navigable small world or hnsw which is a type of K nearest neighbor sorry uh yes it is actually a nearest name algorithm that will identify what what Vector embeddings are the closest one to the question embeddings that you have asked and then it will retrieve it back for you so I want to show you now a quick demo of how you can generate embeddings push them to Azure cognitive search to generate index out of it and start asking question so let me open up the code there you go I will also add the link of the source that gave me the idea of this code and I use this code from there so you can check that out it has it was recommended that I need to certainly have these pit packages installed I want to use Lang chain for having a chat and conversation with my data and I need to also install Azure search documentation I'm going to show you why if we're going to use that actually for cognitive search definitely opening is needed and Azure identity now I need couple of credentials for connecting to my Azure open AI service for generating embeddings and of course answering the questions in my chatbot and also I need to connect to my cognitive search service for pushing my embeddings there and do the indexing and retrieve the knowledge from there so you need to have all these credentials in a sec secrets.env file to load it from there you can have it as an operating system environment or this just for the test I hardcoded them and I removed them after I executed this code now where you can grab this let me show you how you can grab open UI key and open AI base and how you can grab the service name for cognitive search and the key of that and they're going to create an index of our word embedding so that's just a random name I chose that can be anything for you so let me show you in Azure portal how we can grab this information okay so this is my open AI service you just click on key and endpoints you will see that there is an end point here highlighted and the key of your Azure open area so these are two info that you need for connecting to your Azure open Ai and for cognitive search here it is this is my cognitive search you can just copy the URL from here and if you click on keys on the left side you need to also copy one of the keys here this is also needed for connecting to cognitive search now getting back to my code assuming that you grab all this info you come here again this is just a place that I grab this information to the code but now first of all I need to initialize two models the first one GPT 3.5 this is the one that the initial chat GPT use I want to use this model from my Azure open AI for having that chat conversation and answering my question and I also want to use other for generating word embeddings of the data that I have where is the data the reference code that I use had some sample data so I'm using that let me actually show you so this is actually the folder that I'm using that I have my python code that I'm providing walkthrough the secrets that I have added here and the data I downloaded from that reference code that has some information I think it's about open your Azure opener documentation there you go these are just sample text that I want to chat with them and index them with the reward embeddings using cognitive search so getting back to the code this is the uh the model that you're going to use and of course uh with the only information needed for the Authentication and now I also need to connect to cognitive search for index singles generated water embeddings I grabbing these credentials from the top that is specified and then I say for generating embeddings I'm using this embedding here which is using my other Azure open air model that's it then I haven't created the embeddings yet I just specified the model that I'm going to use so I'm loading the data here under the the current working directory that I mean the textwise that I showed you and of course you need to chunk the data as usual we chunk the data and then we generate embeddings here the chunk size is 1000 I think that was the default value you can set up some overlapping in case for Best Practices here I just added zero but just making sure that you're not losing some context when you chunk the data you can duplicate some portions of your chunks with an overlap size here and that's it then I split my data and then I add documents there to my cognitive search and of course by adding I mean I not only add data I also add the word embeddings as well now here I'm going to create a conversation center using Lang chain so I'm importing Lang chain that I'm saying that I want to use my large language model which is upd 0.5 to answer some questions based on the sources that I am providing and retrieving them using cognitive search how I retrieve them I'm using cognitive search ACS that is coming from here and then when it is ready I executed that actually generated word embeddings then I started to ask questions like what is azure Opening Service if you remember that was in one of my text files so it went all the way to Azure cognitive search it first calculated the volume meanings of this question then it compared the distance of the word embeddings of this question with the word embeddics I generated for my text files with the chunks of them and it answered me based on the closest word remainings and there you go this is the answer I tried also another answer and then you go the answer got generated here but let's see what happened when I generated the board embeddings out of these tags and I store them to cognitive search so if I go back to my Azure portal this is the cognitive service that I had and I use it for generating uh storing those word embeddings and you can see mg tests is created why because I just simply use this name for my index name but that can be anything technically and let me click on that it is telling me I had six documents by the way I didn't have six these are chunks I just have I think three text wise but I chunked them that's why I have six here and now well because these are now just vectors let me actually add just a number to see if it can code anything for me okay there you go this is an example I want to show you what's happening here so you can see that this this is the content of that chunk so this is the actual text that I had and then it generated the vector out of this context and this is the vector so when I do Vector search I'm calculating the distance of my Vector embedding question versus this content Vector that I refer to and there you go it is actually getting generated and stored here for me read some metadata for example what's the source of data that's going to be used when if you want to add citation and if your chatbot answer the question you want to know based on what sorts of data we can use this metadata but you can see this is just Vector query that I use from the code but again cognitive search previously had different ways of quoting your data for example semantic search that I just explained that so you can activate it as well enrich your indexing and when you retrieve your information from booking a search don't just rely on semantic search or keyword based search you can even have now Vector search as we just actually implemented that and I think by default when you generate volume when you store your volumes in cognitive search this is actually how my index and fields are generated so it generate an ID the content which is my text file Source data the vectors that I showed you that and the metadata which is talking about the source of data where I grab this word embeddings okay so that was of course a pretty simple and quick explanations of first what is Vector search and making sure that I'm updating you with what has been recently added to Azure in the search I think which is a great contribution what's going to be the proposed value of that and how you can enrich it with some other retrieval knowledge and then just quickly show you how you can Implement that quickly through the code and as you saw I'm no longer using the the native cognitive search indexing or semantics so I didn't use any of them I just simply generated my war embeddings and I store them in cognitive search without necessarily creating an external Vector database launching redis and figuring out how I I put all those with some metadata stuff in my Vector database none of them this connection to Azure Collective search and this magical part took care of everything for me and you saw that I was able to ask some questions about my data now I am chatting with my data and it worked properly all right I hope I'm talking for 15 minutes just for the tutorial part I wanted to make it a little bit shorter just making sure I'm talking about a lot of possibility and you're aware of what's going on and the trend with cognitive search and this um chat with your data scenario but I hope that was short and comprehensive enough to let you start and know what's going on there but as always you're more than welcome to ask your questions or any comments you have downside in the video all right that's all there is only one thing that is in common in all human beings across the world imperfection you can try so hard to be perfect but that never gonna happen because it is not in human nature to be perfect and you know what is the solution for it forgiveness take one moment and forgive someone who has sinned against you and at the same time try to ask for forgiveness who you have seen against them this will move a big Stone from yourself and it will bring so much positive energy to you you will be free more powerful and more concentrated dream big my friends believe in yourself and take action until next video take care foreign
Info
Channel: MG
Views: 9,332
Rating: undefined out of 5
Keywords: open ai, gpt 3 ai, gpt 3, azure ai, openai chatbot, gpt 3 fine tuning, openai gpt 3, gpt 3 prompt engineering, openai chatbot demo, open ai chat gtp, Azure, Azure Open AI, ChatGPT in Azure, Open AI in Azure, Open AI in Azure Demo, Azure Open AI demo word embeddings, chat gpt, advanced tutorial chatgpt, Azure Cognitive Search with Open AI, Azure cognitive search vector database, Open AI word embeddings with azure cognitive search, Word embeddings in azure cognitive search
Id: Re4fLSKi43A
Channel Id: undefined
Length: 21min 30sec (1290 seconds)
Published: Tue Aug 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.