How To Save OpenAI Embeddings In Database

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone in this video I will show you how can you save embeddings and database it means you are persisting your embeddings and need not to rerun or create them every time whenever you are running your application so here we are using openai Lang chain pine cone and python as a programming language so if you are not sure what pine cone is you can go ahead and reference documentation but with respect to length chain how to use it it is provided over here the brief summary is given so you can go ahead and read this at blankchain.readthedocs.iu so very first thing you need is you need to install the Pinecone client before you get started with this database okay so let me quickly go ahead and write few lines of code and before writing I want to show you that I have created a folder which contains four text files so what we are going to do here is we will be reading these four text files will generate the embeddings of these and then push it on to our Vector database which is fine cone so let's get started so very first thing is we need to import directly loader because we want to read the directory document loaders import diet reloader then we need to split the text because we do not want to face that token limit issue so we want to split it into fixed number of chunks so that we can process it easily so let's go ahead and say link chain dot text splitter import character splitter so this is the code which you must have already already seen in my previous videos so there is nothing much which I need to explain again but I will brief you what we are doing okay then I am saying loader which is loading the directory and the directory like I mentioned we are going to put everything in the store directory so let's load the directory and get the list of documents out of it so loader Dot load and then we need to call the character splitter so we will say I will create an object character text and let's say splitter and we'll simply call it character text splitter in which you will Define the chunk size so let's go ahead with 1000 as a chunk size and chunk overlap we can just say zero it's okay to miss some information in this case but if you do not want to miss anything make sure to Define some valid values in for the chunk overlap and then I will say Doc text character text splitter dot split documents and inside that I will be passing this okay so we are done with our splitting so let me quickly run this first okay and I can show you what is this doc text okay so initially when I saw you uh when I talked about it we were having four documents okay but after splitting we you can see that now we are having more than four because single document should not exceed more than thousands so that's the reason it distributes all these into these many number of documents okay now we have the documents and how it broke this you can see here so if I will open the very first document you can see it is starting with homelessness and all then second document is effects on them so there should be something in this let me quickly find that word so affect what would be the from here so from here it starts the second document so this is the first document and effects on them they will only touch upon so this is the second document so that's how it is breaking the documents once we are done with this character splitting next thing we need to do is we need to have database ready so database like I said we need the pine cone so I have already installed it so I'm not installing that again but let's go ahead and import the required modules so from Lang chain dot Vector store we are going to import pine cone and we also need open AI embeddings which is again present in Lang chain so let's go ahead and import that to open Ai embeddings and at the end we need pine cone okay so we are done with all these things next thing is we need some embedding model which model we are going to use so for that you can go to documentation I think here okay so here the this is the documentation provided by the open AI on the left hand side you can click on embeddings and you will get all these options so if you are going with this particular model text embedding error002 then you can you can have output dimensions for 1536 so let's go ahead and use this particular model here so I would say embedding model equal to text embedding error002 and then we need to create open AI embeddings so for that I will say open AI key and I have already grabbed the key so I'm not going to type it here okay I need to import my configuration as well the configuration is imported and then we can so if you are not sure from where to grab this key I can quickly tell you go to open AI login and then on the left hand side you can see view or API key so just click on that and you will get your key over here so this is what you need to copy it once you grab the key you need to create the embedding objects embeddings object and Runnings okay let's go ahead and initialize our pine cone so I would I will initialized the pine cone and to initialize the pine cone we have an IT function which will take few parameters as you can see in the intellisense also so the first one it will take is the API key now before providing the let me first type it here API key is equal to something and then we also need environment oh comma and here we will say environment equal this now I can show you how to get started with pine cone so let's go here and you can go to this particular URL app dot pine cone dot IU go there log in there and you will see a screen something like this so the very first thing we need to do here is we need to create an index so this thing you can do it using python code also but this is the I feel this is much easier to understand what we are doing and here we need to define the name of the index so let's say I will take something related to my text so cos effects homelessness okay let me grab it we may need it and coming on to the dimensions so Dimensions you can see here go back to documentation and for this particular embedding this is the dimension we need to take so I will just go ahead and write one five three six and I'm going with the cosine and here there are three things you can choose whatever you want I'm going with this one and create index it will go ahead and create the index so you can see it as initializing meanwhile I will grab this particular thing which is environment and place it here similarly I need to grab API key for Keys you can go here and this is the key that you need to grab so I will quickly copy this and paste it over here okay so once this thing is done we are good with our database initialization and let me give some name to this so index we can say index name equal to oops so index name should be the same as what we have given over here so I will grab this one and paste it here oops they let me see why it is not copying otherwise I can type it cause effect homelessness cause effect homelessness let's review it okay so we are done with this now if I will go ahead and run this it executed successfully let's go back to this one so this is ready and here uh it's wanna take few seconds and then you should be able to see something over here so let me refresh this zero vectors so let's give it few more seconds meanwhile we can do some more panning things and once this is done you can you also have a privilege to verify like the statistics of your index and that you can do it using index pine cone dot index and inside that we need to pass the index name which we just created above and simply say index dot describe index tax so before that let's try this one okay so you can see that Dimension is 1536 and right now there are no Vector counts because we haven't pushed anything which is perfectly all right so now we are going to push some data in it so in order to push the data we will be creating a new code cell here and I will simply say Doc store pine cone Dot from text and in this what I I will be passing three different parameters the first parameter is the list of all the documents uh the content of all the documents then we need to pass the embeddings and the index name so the very first thing we will be passing is or content of all the pages all the documents I would say so page content for D in doc texts and then we need to pass embeddings as we created above and then comes the index name which is the one we created here okay let me run it now okay it is successful now we should be able to see some vectors in our database give it few seconds now okay so you can see nine vectors here and the reason why you are seeing 9 is see you can see here also it's nine and the reason why we are seeing nine here is because there are nine documents over here so for each document you will see one single vector so that's how you can save your embedded data into the database embeddings into your database so I hope this makes sense and so going forward whenever you want to read the document and query you need not to repeat the entire process what we have done till now you can simply go ahead and read it from this pine cone and just fire your queries so right now we have this talk store so we can definitely give a try do one thing so let's say we want to query something what are the effects of homeless what are the effects of homelessness this is the query I want to perform and I'm looking for some answer from those documents okay and then we will say docs equal to Doc store the object which we just created above because it is holding everything related to that and we will be doing similarity searches so what we are saying is we are saying get us all the documents which are similar for this query which can answer me this particular question so that's why we are calling it similarity search and let's say Docs so when you will say docs it means these are the five documents which can answer your this particular question so that's what we are doing here so going forward you need not to like I mean the system need not to go through all the documents like 10 1200 whatever you have rather it will just get to the answer from these five documents based on the cosine similarity which we have used now next thing uh I can show you is and the same thing let's say you want to run this query now you are not interested in what all documents you just want the answer of your questions so what you can do is you can import question answering from the length chain so line chain dot change dot question sorry question answering and here we will say import load QA chain and inside this we can create llm object we can like similar way how we have done earlier we will just pass these mandatory parameters configuration dot open AI key and then we can create an object of q a chain because that's what we are going to call load q a chain which will take parameters like chain type and the language model so let's go with stuff because now we have very limited document to stuff and once this is done we can go ahead and call our query so let me grab this line here and we will say QA underscore chain Dot run and here we will say my input documents input documents Docs and the question here is query let's try this one and see perhaps I need to import something let's see okay so we got our answer and it is saying the effects of homelessness include death reduce life expectancy health problems and all these so these are the things which are coming directly from the document so I have this shown you how to do it but actually you will not be using this particular method because we want to query our index rather than going and quitting like this so that part I will be covering in my upcoming videos but I hope you find this particular content useful wherever wherein we saved our ambitics into the pine cone database so thanks for watching

Info

Channel: Shweta Lodha

Views: 1,333

Rating: undefined out of 5

Keywords: Artificial Intelligence, VsCode, Programming tutorial, Programming, Python coding, OpenAI, Machine learning, Shweta lodha, ChatGPT, What is embedding in OpenAI, What is langchain?, Integrate OpenAI with Langchain, OpenAI token error, How to Save Embeddings permanently, How to save embeddings in vector database, Saving embeddings in Pinecone, How to save openai model, How to use langchain with pinecone and openai

Id: TjNQifaG-HA

Channel Id: undefined

Length: 17min 20sec (1040 seconds)

Published: Tue Mar 21 2023