I built my own SEMANTIC TEXT SEARCH WEB APP using OPENAI EMBEDDINGS + STREAMLIT | ada-002 engine

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys I'm over and welcome back to my channel so today's video will be on introducing text and code embeddings which open AI endpoints are providing us with and trust me it's pretty powerful and I will let you know why it is powerful because I will showcase our demo app which I built using streamlit and this open Ai embeddings and you will see it does the semantic search job pretty well so let's first see what is this embedding about so if I just quickly go to this documentation of openai embeddings are nothing but the measure of relatedness between tools text strings and you just try to understand how related to strings are it can be useful for searching or clustering recommendation the classification all this purpose embeddings are pretty powerful the what are the models which open air is providing if I just go back here it's providing the gpt3 under gpt3 this takes DaVinci which is one of the most capable one and the most latest one which we have discussed in a previous video as well the text query which is also pretty good the Babbage the other so in this video if you go through you'll see I'll be using the text other for semantic search and why other because it's not as expensive as other one so let's quickly go to the pricing section and if you see compared to other the DaVinci's 0.02 whereas takes other is 0.04 per 1K token so it's competitively cheaper to any other models and it's not the best or it is not the most powerful one but it is the most fastest one and I think for this kind of automation task or a daily live automation task or text other does the job pretty well now if I go to what you stick's other capable of as I said before it's good for parsing simple classification address correction keywords I mean these are what which we need for daily jobs right and this is something which we can actually build on and and scale up our app at the end of the day and takes other does this job with very less amount of money but it's very fast it does this job so why not leverage text other model out here and try to build something which is very easy to build and let's see what I have built using it something very minimal which is called as notebook AI so basically I dumped all my notebooks notes which I've taken all this time into a Excel sheet to get the vectorized form of it I did the embeddings out there which we'll do today as well try to build an app on the top of it search something it will give me the context out of it so all these notes which I have taken all this time is dumped in a as a data format so let's search for nature and I press the Run button and if you see all the context which is coming up is based on this particular keyword called nature and this nature plants is one of the scientific journal this one in the green is the relatedness because we tried to calculate the relatedness between two strings and just taking it from the previous context and I just press the Run button again and let's see if we can get out the context based on the PS1 tetramer and we already see the the relatedness is much more higher it's around 0.85 and we see we get this uh tetrameric term the keyword out here also here we expect a keyword tetramer out here for the system one tetramer and then we get something which is not so relevant probably the the third hit but let's take from this 30 this particular q word tbsp S2 and photo system one and we paste it here and I press the Run button and you see we get all this output which has the word PBS so that's pretty powerful I won't say that this is something not giving us our expected result is giving us our expected result and the other model is doing its job pretty nicely so how we build this kind of AI or this notebook AI thing I'm pretty sure that's something which you guys will be interested about and for that I divided this whole tutorial video into two section so the first section is about where we use Google collab or jupyter notebook to vectorize our data so at this point I don't have so many datas uh this data is made up with only 20 or 25 rows still it's doing a good job use very simple straightforward syntax uh the embedding syntax which open air provides and once we've done this part we create the data which has the vectorized form on the embeddings in it and then we try to call that data and find our search term with it and in that use case we will use a streamlit front end or this simple the UI will build while trying to have make a search engine out of it I'm pretty sure that's something which you guys will be curious about so why not go to the coding session and let's start writing a few lines of code together let me quickly walk you through the code which I have written in order to perform this embeddings in our data which you already have the first part is about installing the open Ai and the pandas since it's already installed in my computer showing the requirements are already satisfied once we successfully install it the next part is importing the modules modules which you need to import are the pandas which you just installed now the open AI which we will use for the API keys and all the other stuffs that's why we need to import them open Ai and also the most important model out here is the gate embedding those module which will import from openai dot embedding utils so this model will help us to vectorize our data that's pretty crucial out here so we run this part as well so once you finish this part show you the data which we'll be using so it's more about uh the data which already I have just used one column okay so we don't need to inspect or tweak with the data much it's very straightforward so I I have my data already uploaded here notes dot XLS so as you can see it's the notes only one column with a lot of stuffs dumped out here but it has only 22 rows that's pretty less okay I I would rather go for more amount of rows but unfortunately I was pretty lazy to dump all the notes which I have all this time so it suggests that we go for a data which is much larger in size nevertheless uh we just create the embedding for the top of it but first let's load the data okay we just use pandas totally read acceleration because it's in the Excel format you can also use CSV file also so you can just use read CSV so it's basically pretty straightforward I think it's it's you read uh CSV like this that's it once we do this part uh we can just dump our data out here so it's already loaded now and the next part is we need to insert our open AI API key if you have been in my channel before as uh well I have described how we can actually get the open API Keys which you have the account in open AI you just go to a few API keys and you create a new key and you immediately copy that and you will just need to paste it once we get the input box so for that we need to either have a configuration key separately or a secret file or something called this where you will use the get pass module and we'll get a space where we can just input our secret key so I'll just quickly go and generate a new secret out here I copy this part and I will paste it here if you see how did I paste it and we have entered our open AI API secret key so once this part is done now we can use any model which open AI API offered us right so we use the text embedding other O2 which we discussed all this time it's the fastest one it's much more cheapest one and why not use this so we will use this building model and we'll also try to perform this game building function we'll try this get emitting function on each and every rows so if you see this particular file here every row needs to get embedded needs to get vectorized for that we use this particular function called Lambda so it's performing in h and every row it's iterating for each and every rows and as a result we get this vectorized form for every rows in one line of command otherwise you can just Loop over them perform it that's also pretty good it's pretty straightforward once this part is done we save it as nodes in building.csp that's all we need out here with this Jupiter notebook or sorry Google collab and once we have this we once we run this part I am not running it again because it will again charge me but once you run this part we'll create a file with notes and embedding it you see it's it's a particular one row of nodes having so many uh vectorized tokens and this is something which the model will leverage it's not for us we don't understand what he's trying to say but the model will use that part so that's all from this part where you try to create this uh notes embedding.csp file once we create this we can go to the next part where we'll create the uh create the UI that will be very straightforward if you're in my channel you have seen a lot of streamlined videos and we have created a stupid bot before we would use a bot today but we'll use a very simple input box where we'll just jump with some text and then we can get this we can connect with this embeddings go to the implementation of this UI so once we have this vectorized from our data which we discussed in the previous section and the next part will be to create the UI which we we plan to make this notebook AI which you plan to make and it's very easy to make with streamlit streamlin is a python front-end framework work which is very easy to deploy any web apps and that's something which I have made a lot of videos in my Channel please check them out it's very easy to install using pip command peep install streamly so you need to make a folder and have a python file called app.pi as I did out here you can name it as you want but it needs to end with DOT pi and once you have that you can start writing your code so one of the things which we need to import is the stream lead open Ai and also other dependencies so I'll just copy and paste the code in my GitHub repository you can find all the code here also I will put in the description box the link of this particular URL few of the packages which I installed I will discuss about it and few lines which we need in order to make this web app will also do that so first thing we have the pandas Library we need this pandas library to read the CSP file next we need the numpy because we'll convert our embeddings to a numpy format numpy array format we need the streamed it in order to create the web app open AI for its open API key these and we also need the open aim bidding YouTube this is the most important part we need the module get embedding which you used before also this time for the search term which will put it in and also the cosine simulator to find the similarities between the search term and our data the first line which we wrote out here is this typical streamlined syntax called St dot title so let me show you how the app looks like okay we'll go line by line and we'll see how the app looks like so what I will do is I will just comment until this part of the code and we will go in this particular place and it's already served in the Local Host extremely run and out here my script's name is app.pi so that's all you need to write nominal once you do this it's enough for the app to run in the Local Host so this text input will create a text input widget which will allow our user to put its his or her own open API key so I just paste it here if you see it's not showing it because any of the how the API key looks like and as I press this uh it's because the type is password so once we have the secret key all we need to do is we just assign them so basically we don't need this function we can also store it in a session state but that's not critical for our purpose so we can just need to use this particular command openai dot API key and we pass it as a secret the user secret which will be the output from here so one of the most critical thing of the app is about loading our data I use the pandas dot read CSV and I explicitly write the path of my file which is the node embedding to CSV that what we made all this time the embeddings with the vectorized form we'll just load that how we can see how the data frame looks like we just need to write sd.data frame and we dump our variable which we'll assign to so I say DF equal to load data and then the output we assign it here now let's see how our output looks like so if you see that this is the output this is a typically the Excel sheet which we created before and now I'll just come in this part so next we come to the main function which we use in order to find the similarities between our search text and also the data frame which we have created right for that we write this few lines of code if you see the purpose of this function the search notebook function is the to find the most similar notes in the data frame DF which will give as an argument which is basically our loaded data from our CSV file and to search the term which will be passed by the user and we will try to find the similarity between them and we'll compute the model will compute it and give us throw us the result out of it okay so the return will be upon this data frame and that's all we need for our whole purpose to get the similar keywords so let's go step by step what it is doing so first we try to convert the embeddings in the embedding column to a numpy array so we already have a a data frame where the embedding column is present as you can see here things are converted into a numpy array and then we try to get the embeddings of the search embeddings which the user will provide so while speaking about that we can actually create our search box so if I just go here now and this is the search box which we'll create it's again another text input but here we don't use the password or the type because we don't need to hide our search box right so that's why we don't hide it so this will be the search input which will pass into our particular function and this particular line what it does it tries to get the embedding of the search term as we did before for our real data sets we try to create the embeddings for each and every rows in the data frame similarly for this search term we'll create the embedding using the engine text embedding other O2 that's all and once this is done we'll try to compute the similarities between the search term and our data and we will use the particular module called cosine similarity which we already imported and in that way we'll get a particular column called the similarity where we'll dump all the datas right that's very simple that's all we need and we'll just later we'll try to sort the values that's all this part of the line is doing it so that we see the one which is the highest similarly at the top that's a very trivial uh pandas usage and then we just try to print the result or we get the result out of it right even this part of the line is not required now we have a new text box where the user can input anything out there like any query he or she wants to get so once that input box is done we press enter it will start the next part of the code right so let's see like you know if I just press like this and I press enter and I just give an output to this search term you will see the search term will come up this is a search term but we won't use it like that what we will do is we'll fit this search term in this particular uh function what we wrote all the or we discuss all this time the search notebook right but before we uh put it dump it into our function we also need to make some checks so what we can do instead of making a check we can make a search button which will be how we get this error because I just can't commented this part so we'll create this particular search button which will help us to run our model and once someone clicks this they will see whatever the it is hiding behind it okay it will start running it and the next part will be this few lines of code let's see what this lines of code speaks about right so the first thing is if there is a search term so basically if there is if it's empty this won't work okay only if there is a search term then there will be this button which will be generated right we can actually write it in this way if the search term we generate this button then we don't need to write this line of code so whenever there is a search term that this button is generated perfect so this way we we make sure that there is some search term out there in this input widget so once that improved is there the button will show up and the next part is we just feed in all these arguments inside this search notebook which we discussed all this time so the first thing is the data so basically it's our DF which we loaded that the as a DF if I'm not from the variable so it's just we just dump it here and we also dump the search term because this will be again going through this particular function of gate embedding and the next term is how many outputs we want we can have five outputs like uh the first five similarities we can have and also we can print it it doesn't matter this function doesn't require that and the next part is whatever we get the output I just want to you know iterate through the rows and have a very nice output of it say we we write write something called PBS PS2 PS1 we searched with this in the beginning of this video and we got a very nice result and now if I just press the Run button it gave me each and every gave me five one if you see now five of the most similar ones and it shows it's pretty accurate it shows me all the context of that particular search term that's pretty cool right I mean that's something which we're hoping for and we can just change to something else let's say we go to Mega complexes I don't know if this will work or not so we just run it like this and let's see uh first thing which we get I don't see any accomplished out here but from the next one we start to get this Maya complex maybe our model needs to be much more refined in that way we can get a better uh output or the result out of it so as you can see with just few lines of code I could generate this notebook area system and I can just search for any queries let's say we search for a tetram our PS1 and we get the output out of feed and that's something a pretty cool semantic search engine which helps us to get the context of our text which you're looking for and that's something like a notebook which we don't need to waste so much time to go through it our AI will do the part for us so I hope you guys enjoyed this particular video where we spoke about the embeddings which open a offers us also the semantic searches which openly helps us to scrap through any notebooks or any applications or any particular text which you will feed into it and that will be very powerful when we leverage or scale this up this particular web app and I hope you guys will try different way is out and I hope you guys will write down in the comment section what you guys think about this and what are different approaches you can take and I'll be happy to make more videos on this on open Ai implementations and their apis so please subscribe to my channel share this video and give a thumbs up to this video it will be great to receive feedback from you all cheers
Info
Channel: Avra
Views: 5,584
Rating: undefined out of 5
Keywords: openai, openai gpt 3, openai gpt, openai embeddings, openai chatbot, nlp tutorial, nlp training videos, openai nlp api, openai nlp, streamlit python, streamlit dashboard, machine learning python, streamlit tutorial, streamlit machine learning, gpt 3 nlp, gpt 4, gpt 3 fine tuning, gpt 3, gpt text classification, nlp gpt 3, chat gpt nlp, nlp, nlp openai, embeddings from language models (elmo), natural language processing, machine learning, artificial intelligence
Id: 393BsKexv2A
Channel Id: undefined
Length: 18min 44sec (1124 seconds)
Published: Mon Feb 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.