Generative Question-Answering with OpenAI's GPT-3.5 and Davinci

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today we're going to take a look how to build a generative question answering app using open Ai and Pinecone so what I mean by generative question answering is imagine you go down to a grocery store and you go to only attendance and you ask them you know where is this particular item or maybe you're not entirely sure what the item is and you say okay where are those things that taste kind of like cherries but look like strawberries and hopefully the attendant will be able to say you mean cherry strawberries and they will say okay you just need to go down to il-2 and they'll be on the left or they'll you know take you there that is basically what generative question answering is you're asking something a question in natural language and that something is generating natural language back to you that answers your question now I think one of the best ways to explain this idea of genitive question answering is to show you so we don't have damaged data behind this as a six a little bit over six thousand examples from the hugging phase pie torch of tensorflow and streamlit forums but is enough for us to get some interesting answers from this so what I'm going to do here is I'm going to say I want to get a paragraph about a question and I'm going to say what is the difference or what are the differences between tensorflow and Pie torch and I'm going to ask you this question we can limit the amount of data let's go into pull from by adjusting top case at the moment you're going to return five items of information but I think that should be enough for this so let's go and we get this so they are two of the most popular deep learning Frameworks pytouch pattern based developed by Facebook while transfers live redeveloped by Google both Frameworks open source large community main difference is that pytorch is more intuitive than easy to use wow test flow is more powerful and has more features alright so it's better to do with rapid programming and research while tensorflow is better for production and deployment you must have pythony's more pythonic whereas sensor is more convenient in the industry prototyping deployment and scalability and then it even encourages to learn both Frameworks to get the most out of our deep learning projects if we have a look at where this information is coming from so it's mostly tensorflow so maybe there's some bias there and we can actually click on these and it will take us to those original discussions so we can actually see where this model is getting this information from so we're not just relying on you know this actually being the case so another thing we can ask here is something that is perhaps more factual rather than opinionated is how do I use gradient tape in tensorflow and let's see what it says again we'll get this so it's powerful tool intensflow that allows us to calculate gradient computation inspectors input variables and then to use gradient tape you first need to create a tape object and then record forward pass of the model so after the forward pass is recorded you can then calculate gradients with loss of respects of most variables this can be done by calling the tape gradient method and then we continue with some more information like we can use to calculate higher order derivatives and so on so that's pretty cool but let's have a look at how we would actually build this now when we're building generative question answering systems with AI and naturally we just replace the person with some sort of AI pipeline now our generative question answering pipeline using AI is going to contain two primary items a knowledge base and the AI models themselves so the knowledge base we can think of it as a model's long-term memory okay so in the shop system example the shop system has been working in that store for a while they have a long-term memory of roughly where different items are maybe they're actually very good in their know exactly where different items are and they just have this large knowledge base of everything in that store and when we ask that question we are producing our natural language prompt or query to that person something in their brain is Translate creating that language into some sort of concept or idea and they're looking at their knowledge base based on their past experience attaching that query to some nearby points within their knowledge base and then returning that information back to us by generating some natural language that explains what we are looking for so if we take a look at how this will look for our AI pipeline we are going to have an embedding model here and we have a question like some text here question we feed that into the embedding model we're going to be using the order 002 model which is a GPT 3.5 embedding model let's go into translate that query into some sort of numerical representation that essentially represents the semantic meaning behind our query so we get a vector in Vector space now we then pass that to the knowledge base the knowledge base is going to be Pinecone Vector database and from there within that pine cone Vector database just like the shop assistant we're going to have essentially what we can think of as past experiences or just information that has already been indexed has already been encoded by GPT 3.5 model and stored within Pinecone so we essentially have a ton of index items like this right and then we introduce this query Vector into here so maybe it comes here and we're going to say just return let's say in this example we have a few more points around here just return the top three items so in this case we're going to return this this and this so the top three most similar items and then we pass all of those to another model so this is going to be an another open AI model so this is going to be a DaVinci texture Innovation model and here these data points we actually right here we're going to translate them back into text so there'll be like natural language texts that contain relevant information to our particular query we are going to take our query as well we're going to bring that over here we're going to feed that in there and we're going to format them in a certain way we'll dive into that a little more later on DaVinci will be able to complete our questions so we're going to ask a question we have our original query we're going to say answer that based on the following information and then that's where we pass in the information we returned from Pinecone and it will be able to answer those questions with Incredible accuracy now one question we might have here is why not just generate the answer from the query directly and that can work in some cases and it will actually work in a lot of general knowledge using cases like if we asked something like who was first a man on the moon it's probably going to say in Neil Armstrong without struggling too much that's a very obvious fact but if we ask him more specific questions then it can struggle or we can suffer from something called hallucination now Hallucination is where a text generation model is basically giving texts that seems like it's true but is actually not true and that is just a natural consequence of what these models do they generate human language they do it very well but they aren't necessarily tied to being truthful they can just make something up and that can be quite problematic imagine like in the medical domain and this model May begin generating text about patient's diagnosis which seems very plausible using scientific terminology and statistics and so on but in reality all that may be completely false it's just it's just making it up so by adding in this component of a knowledge base and forcing the model to pull information from knowledge base and then answer the question based on the information report on from that knowledge base we are forcing the model to almost be truthful and base its answers on actual facts as long as our knowledge base actually does contain facts and at the same time we can also use it for domain adaption so whereas a typical language generation model may not have a very good understanding of a particular domain we can actually help it out by giving it this knowledge from a particular domain and then asking it to answer a question based on that knowledge that we've extracted so there are a few good reasons for doing this now to do this we are going to essentially do three things we first need to index all of our data so we're going to take all of our text we're going to feed it into the embedding model and that will create our vectors let's just say they're here and we pass them into pineco and this is the indexing stage now to perform this induction stage we're going to come over to this repo here to pinecon IO samples and at the moment these notebooks are stored here but I will leave a link on this video that I will keep updated to these notebooks just in case they do change in the future we come to here and this is our notebook now we can also open it in colab so I'm going to do that over here again I'm going to make sure there's a link in this video right now so that you can click that link and just go straight to this so the first thing we need to do is install our prerequisites open AI bank and client data sets now I have a few videos that go through basically this process so if you like to understand this in more detail I will link to one of those videos but right now I'm just going to go through it pretty quickly so first is data prep uh we do this all the time of course we're going to be loading this data set here mlqa which is just a set of question answering based on ML questions across a few different um document forms so like hugging face Pi torch and a few others I'm going to go through this but all we're doing essentially is removing excessively long items from there now this bit is probably more important so when we're creating these embeddings what we're going to do is include quite a lot of information so we're going to include the thread title or topic the question that was asked and the answer that we've been given so we do that here you can kind of see like thread Tides like this and then you have the the question asked and then later on you have the answer a little further up which is here so we're gonna throw all of that into our order 002 embedding model and it can embed its own attached it I think it's up to around 10 pages of text so this is actually not that much for this model so to actually embedded things we need an open AI API key I will show you how to do that just in case you don't know so we go to this betaopenai.com sign up and the Ukraine count if you don't already have one I do so I'm going to log in and then you need to go to your profile on the top right you click view API keys and you may need to create a new secret key if you don't have any of these saved elsewhere now once you do have those if you just paste it into here and then continue running if you write this basically you should get a list of these if you have authenticated yeah let's go through these a bit quicker so we have the embedding model here are the 002 credit embedding is here so and so on here we're creating our Pinecone index so we're going to call it open AI mlqa we need to get a API key again for that so that we would find it app.pinecone.io and then you just go into the left here after signing up and then copy your API key and then of course put it in here we create the index and connect to the index okay so that's initializing everything after that we just need to index everything so we're going through our tips here in this in these batches 128 we are encoding everything here so we're encoding area can use latches 128 we're getting relevant metadata so it's going to be the plane types associated with each one of our what will be vector embeddings and we also have the embeddings so we just clean them up from the response format here and we put all those together so we have the IDS they're just unique IDs which is actually just a count in this case settings themselves and the metadata then we add all those to panko okay pretty straightforward and then we have these 6 000 vectors in Pinecone that we can then use as our knowledge base now usually you probably want way more items than this but this is I think good enough for this example now after everything is indexed we move on to querying so this is the next set and not the final set so we have a user query and what we're going to do is it's going to look very similar to what we already did we're going to pass that into text embedding are the 002 model so this is the GPT 3.5 embedding model that gives us our query Vector which we'll call xq and then we pass that to Pango now Falcon has metadata attached to each of our vectors so let's say we return three of these vectors we also have the metadata attached to it okay and we return that to ourselves so this is the querying set but then later on we're also going to pass that on to the generation model which is going to produce a more intelligent answer using all of the contents that we've returned so to begin making queries I'm going to be using this zero making queries notebook here and again I'm going to open it up in colab now I'm actually going to run through this one because we're going to move on to generation stuff which I haven't covered much before and I want to go through in a lot more detail so I'm going to run this we should just see prerequisites and then we'll move on to initializing everything so we have the Pinecone Vector database the embedding and the generation models so I actually start with pancake I'm going to run it again because I need to connect to the index so this is Southern variable called pinecon key and here we're just connecting to the induction we're going to describe the image stats so because we already populated it we already did the indexing we should see that there are already values in there or vectors and we can see that here so we have the 6000 in there and here we're going to need to initialize the opening eye models um for that we need a open AI key and again I've saw that in a variable called opening I key okay great so the embedding model uh let's see how we actually query with that so we initialize the name of it there so text embedding rd02 this needs to be the same as the model that we use during indexing and I'm going to say like we saw earlier what are the differences between pi torch and tensorflow let's ask that question so here we're creating our query Vector here we're just extracting it from the response and then we use that query Vector to query pine cones so in notes.query I'm going to return top three items let's see what it returns and we can see here we have this General discussion I think this post might help you come down it's Pi Torch versus tensorflow so yeah probably pretty relevant and we have three of those so we can download more we have another one here this is on about tensorflow.js in this case and and so on So within that we should have a few relevant contacts that we can then feed into the generation model as basically sources of Truth for whatever it is it's saying now generation stage is actually very simple we have our context uh um here we are feeding them up until here we're taking our query so we're going to have our query question and then we're going to have these context followed onto the end of it or appended onto the end of it now actually before this we're also going to have like a a little statement which is basically telling the generation model how to or what to do okay it's an instruction and the instruction is going to be something like answer the following question given the following context okay and I should actually so this question is actually going to be at the end of this here okay so we're going to have the instruction followed by the context so the the grand truth of information that we return from our Vector database and then we're going to have the question now let's see what that actually looks like so come down to here we have a limit on how much we're going to be feeding into the generation model we're just setting that there we have the context so this is what we've returned from Pinecone up here okay so this text that you can just about see here okay so it's within the metadata types of context from the matches that we got there and then you can see the prompt that we're building okay so we're going to say answer the question based on the context below so then we have context that's where we feed in our data that we retrieved now that data retrieve is obviously a few items so we join all those with basically these separators here and then after we have put that data in there we have the question that's our query and then because this generation model is basically just taking all the text that we have so far and continuing it to keep with the format of question of context question we end the prompt with answer and then GPD is going to just complete this query is going to continuously and generate tips now this is going to be the basically the final structure but this is not exactly how we built it we actually need to remove this bit here because we're going to in the middle here we're going to use a bit of logic so that we're not putting too many of these contacts in here so here what we will do is we'll go through all of our context one at a time um so essentially we're going to start with contest one then we're going to try context two context three one at a time so we're going to start with just the first context and then we move on in the second iteration with the first two contexts and the first three contacts and so on and so on until we have all of those contacts in there the exception here is if the the context that we've gone up to exceeds the the limit that we've set here at that point we would say Okay The Prompt is that number minus one so let's say we go to the third context here but it's too big then we just take the first two contacts right actually after that we should have a break and that produces our prompt otherwise if we get up to all of the contacts we just join them all together like this okay at that point we move on to the completion point so the text generation part now for text generation we use open AI completion create we're using that text DaVinci zero zero three model we're passing our problem to the temperature so like the randomness in the generation we say the stereo because we kind of want it to be pretty accurate we can if we want more interesting answers then we can increase the temperature we have maximum tokens there so how many things should model generates or how many tokens should the model generate and then a few other generation variables you can read up on these in the open AI docs okay one thing here if you did want to stop on a particular character like maybe you want to stop on Full Stop you can do so by specifying that there so we run this and now we can check what has been generated so we have uh here so we go response choices zero attacks and then we just strip that text to move space of the on either end and it says okay if you ask me for my personal opinion I find tennis play more convenient in the industry prototyping deployment scalability is easier and Pie torch more handy in research more pythonic and easier to implement complex stuff so you can see it's pretty lines in the answer we got earlier on now let's try another one so again what are the differences between High torch and tensorflow and what we're going to do is actually modify the prompt so I'm going to say give an exhaustive summary and answer based on the question using the context below so actually given an exhaustive summary and answer based on the question so we have a slightly different prone to let's see what we get so before it was a pretty short answer this I'm saying exhaustive summary and then answer so it should be longer I would expect so let's come down here and yeah we can see it's longer we'll see a part of the term so two of the most popular major machine learning libraries tensorflow is maintained and released by Google while parts are actually maintained and released by Facebook times where it's more convenient in this industry prototyping deployment and scalability so we're still getting that the same information in there but it's being generated in a different way and it also even says python is more handy in research it's more pythonic so on and so on it also includes this here 10.js has several unique advantages over python equivalent as it can run on the client side too so I suppose you're saying tensorflow.js versus tensorflow in Python which is not directly related to pytouch but as far as I know there isn't a JavaScript version of Pi touch so in reality the fact that you do have tensorflow.js is in itself an advantage so obviously while it's been pulled in but maybe it's not explained very well in this generation okay so that's it for this example on generative question answer during using pine cone and opening AIS embedding models and also generation models now as you can see this is incredibly easy to put together and Incredibly powerful like we can build some insane things with this and it can go far beyond what I've shown here with a few different prompts like asking for a bullet point list that explains what steps you need to take in order to do something is one really cool example that I like and this is something that we will definitely explore more in the future but for now that's it for this video I hope all this has been interesting and useful so thank you very much for watching and I will see you again in the next one bye

Info

Channel: James Briggs

Views: 15,973

Rating: undefined out of 5

Keywords: python, machine learning, artificial intelligence, natural language processing, nlp, Huggingface, semantic search, similarity search, vector similarity search, vector search, gpt3, gpt4, text-embedding-ada-002, openai gpt 3, openai gpt 3 tutorial, openai gpt 3.5, gpt 3.5, gpt 3.5 explained, openai embeddings, openai api, openai api key python, new gpt4 model, openai semantic search, gpt 3 semantic search, generative ai, generative question answering, question answering nlp

Id: dRUIGgNBvVk

Channel Id: undefined

Length: 24min 5sec (1445 seconds)

Published: Wed Jan 11 2023