Building a ChatGPT Plugin for Lex Fridman Podcasts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today we're going to take a look at a chat GPT plugin that is going to let us search through podcast episodes and basically answer questions based on those episodes and a few other things so let me show you what we have so far so here we're in chat GPT you can see up here we have a little leps Friedman logo that's because we are going to ask Gillette some questions so we're going to ask Lex about the future of AI so chat gbt is thinking it decides learning to use the Athletics plugin and then it's kind of just giving us a summary from a few different episodes here okay you can see that's also included the source of these episodes which is pretty nice and it's given us a few options okay and then we also get this nice little these cards at the bottom shown as where these are coming from so we get a pretty good summary of AI being discussed in lecture and podcast episodes and also this as well this is I think one of the courses he did at MIT right so that's pretty cool now what if I maybe I want a little more information from a particular one of these episodes okay so let's come down here and I'm going to say Can you tell me more about the Eric Brian Johnson episode okay see what comes up so it's again referring to the plugin even though I didn't say athletes this time it knows that I still want to use this plugin okay cool okay so there's a lot more information in this kind of summary of that particular episode right that is really interesting and basically what it's doing there is searching through our entire database for that particular episode rather than up here we're searching for a particular topic now it's searching for a particular episode which is pretty cool now let's just continue this conversation a little bit so interesting uh does let's talk about space exploration at all see what we get okay cool getting a few summaries like we did at the other time so we get so it's two here one with Darwin Newman and the other one with Arielle ekblo now that looks pretty cool so this one's interesting talking about building space mega churches that sounds interesting let's let's ask about this can you tell me more about building space Mega structures okay and then it's kind of honing in on that specific episode so it's seen we haven't specified that we want to talk about that but it's seen the previous interactions and decided that yeah we're probably talking about this particular episode or at least this information can be found in that particular episode so again it's kind of filtered down there or maybe it just performs search I'm not sure but we definitely get a very interesting and and relevant answer here now that is just kind of a glimpse of what this sort of plugin can do and naturally we can apply this to a ton of different podcasts videos or just you know anything you want really let's kind of dive into how I actually built this and what is actually happening when chat gbt is deciding to use the acid palette plugin so let's come up to the top here and we have our first question okay our selects about the future of AI we can open this to see what is being sent to this plugin and we see that there's these queries query the future of AI all right that's cool so we're querying the future of AI this is a semantic search so we're using a vector database here if that doesn't make sense I'm going to explain it but essentially you can think of this as being our question and these are the parts or the chunks of transcribed audio from Deluxe Freeman podcasts that are the most relevant or the most semantically meaningful to our particular question okay these are pretty big but if we go through we should be able to find somewhere where they're talking about AI all right you see there's a ton of things being spoken about here but chat GPT is actually managing to kind of find okay here like when you start again truly superhuman artificial intelligence kind of by definition I'll be able to do a ton of things I kind of thought of and create a world that I can imagine you know all this sort of stuff all right so that's always getting that information from right then summarizes and gives us in that answer okay and we can see that there's three entries there right so we have that the Eric Brian Johnson episode The Deep learning state of the art and the Andrew NG episode okay which is what we see down there and then there's this one so I'm saying can you tell me more about Eric Brian Johnson episode here if we take a look you see that the queries changed slightly so I'm saying I want to know more about this specific episode and what we have done is actually given Specific Instructions to chat gbt saying you know if someone wants to know more about a particular episode you can search for that particular Episode by adding in this filter and then filtering based on the title okay so the title it actually knows the title from the previous interaction okay so up here it got the it got the title here in the metadata of that result right so then it can search for that particular podcast episode right so I think that that is pretty cool and you can see that the results here then we're only returning chunks of text from that particular episode and that's basically what is happening as we go through the rest of this so here it will just be a search the query and then down here I'm not sure what it's doing actually so okay so it's doing a search and it's also filtering for that particular episode because it knows the information is in there all right so you can see a little bit of how this is working but let's maybe dive into a little more detail how I actually built this okay so let's just try and visualize what we have actually built here so you just saw the chat GPT you can think of it as almost like front end of this whole thing so we have chapped GPT over here now tragedy BT they now these plugins that you can connect to chat GPT and the way that I think they work because the code for that isn't actually public is that you are basically passing your your plugin which is it's like a tool for your for your chat GPT instance to use you're basically passing that into the prompt or the initial prompt of gbt maybe even like every prompt that is maybe it's reformatted on its way to chat GPC I'm not sure but basically somewhere in the prompts there is something that says hey chat gbt you can use these plugins okay and then it will list each of these plugins with a little description um about each one of these plugins which we actually do write ourselves okay so in our prompt you know may say whatever it normally says now it says you can by the way you can use the the let's Friedman it's gonna put Lex F database and it it will help you answer questions about the latest Friedman podcast and when you see the words ask let in the user's prompt you should 100 use this all right that's basically what we wrote so Chad guptc sees this sees your your question and it will sometimes decide that it needs to use this if it does decide to use this it is going to create a prompt to send to The Lex Friedman database okay and that prompt is going to be like actually you already saw it it contains queries and then within that you have like a list of queries okay so it can actually pass multiple queries in there but we just have one query here it's just how the the thing is built and I'll explain that later and yeah so that is actually being passed to this athletes endpoint or plug-in and then we have this response from the plugin now how do we get that response that's probably the more complex bit that we will need to figure out here so once we pass that query okay so we pass the query from gbt into our plugin we are now in a space that is not necessarily anything to do with chat GPT okay so in this space it is our own code or API whatever that we have set up the API that I've set up here is based on the chat GPT retrieval plugin API that was created by openai and you can actually see that here so we go to github.com openai chatgpt retrieval plugin and what kind of tells you here that chat GPT retrieval plugin lets you easily search and find personal or work documents by asking questions in everyday language right sounds pretty familiar to what we have done it's just rather than personal work documents we've used Lex Friedman transcribed tabs cool so we have that that retrieval plugin we'll call it the it's the chat GPT retrieval plugin CRP okay right we have set it up to use it open ai's text embedding model rd002 okay so r.002 and you can basically think of this as the same as the GPT models that you've seen but rather than generating text this one actually generates numerical vectors based on the semantic meaning of the text that you're giving it okay so what I mean by that is you're basically creating these these vectors that go into the selector space here and this vector and this Vector here will have a similar meaning whereas this Vector all the way over here because of this distance this space between this vector and the others this one will be will have some different meaning okay so it's like we are mapping human meaning into numerical representations that we can then use to perform searches so how do we perform searches through this space okay so imagine we have our user query over here it's going to be you know what was our question it was Athletics about the future of AI right so the future of AI is basically our query it's going to go into this r02 model and that is going to create an embedding right and it's going to maybe place it over here right so then we know that these two items these two documents that we already embedded that they are very similar and therefore probably highly likely to be relevant to our particular query okay and all of this this whole thing that you have going on here where you have all these vectors and you're searching through them all of that is handled by another service which is the Pinecone Vector database okay so the pinecon vector database is basically a way of searching through this numerical bit space in a very efficient way at scale right so you can have like hundreds of millions of these vectors or basically documents in there and you'll be able to retrieve relevant items within around a tenth of a second or something super fast right so that is all handled by the vector database component there okay so we are then going to return those items out here okay so we bring those here and we'll see these are our we'll say that we're going to return the top three most relevant items okay so let's say there was a another another item over here right so we're going to return those and that is basically this is going to be the output of our CRP so our chat GPT retrieval plugin okay so they go into here and they go back to track GPT now chat GPT has more information to answer the question right and it also as well as the original text that we have over here right it also has where this text has come from so it has a podcast episode titles and and the URL to the podcast episode and everything else which is exactly what you can see in here all right so these results we have the original query and then we have the things that we returned right so we have that text and then metadata title and the video ID here which I think is actually being used for the for the URL but that's something we should actually fix so in the source here we should add or in the URL we should actually add the actual URL for the kind of video but that's something we can do another time so what we have so far is we just basically augmented our original query with all of this other relevant information from the lecture treatment podcast right so now our query is going to be a ton of text based on all that stuff and we've just returned it to Jack gbt over here and we're saying okay I want you to answer the user's question based on all this information here the user's question is and then it's whatever you whatever you put in here right so that gets fed into there and then chat gbt is going to answer your question and we saw that right so we got this kind of response where it's it's telling you a ton of stuff right and it's sourcing the information look we can actually click on here it's going to take us through to that episode right so this is the yeah the Eric Brian Jolson episode but there is still a lot more to this whole thing than what I've just described I mean one there is actually the code to do all of this which you can actually find if you go to GitHub the plugin code itself is actually all here right so this is the forked track gbt repo and yeah I mean this just it basically contains the same code as what you would find in the chat GPT retrieval plugin it's just I've modified it a little bit and for our particular use case so for example if you go to the well-known here this is basically where you sell the instructions for Chachi BT to understand your plugin and you go to the AI plugin here right so I've changed your name for the model okay so the the name that chat gbt we use to understand this model uh the name that we understand it with right we also have okay you can use this tool to answer user questions using Lex Friedman podcast if the user says Ass lights okay so we saw that as selects earlier on use this tool to get the answer this tool can also be used for follow-up questions from the user and filters can be used to grab more information from specific podcasts like by filtering for a specific podcast title right so that's the primary context the primary instructions I've given track GPT on how to use this plugin which is they're pretty light and then down here we've just set up where our API is hosted right so we need to include that so that checkupt knows where to find the open API spec for our app so these are basically going to be the instructions that chat gbt will use in odds understand how to actually programmatically interact with our API or our plugin okay and there are some changes made there as well nothing nothing significant okay so let me just put in the server you have to do this and then there are some instructions around how track GPT can use this again so we'll say this is an array of search query objects each contain natural language query string query and an optional metadata filter filter okay so we also say here filters can help refine search results based on criteria such as document title or time period so we can say these are particularly useful when a user asks for more information about a particular record for example the user may ask more information after being provided with information from a specific podcast with the title our AI future with chat gbt right so I'm using an example here so that chat GPT knows exactly what to do in that case the filter field can then be used to return more information by using title our AI future chat GPT all right so it's very we've been very specific and explicit on how track GPT can use this okay so we build that let me show you some very quick code that just shows how I actually populated our database with all this stuff so there'll be a link to this you'll be able to in fact it will be somewhere around the top of the video right now to this notebook you'll just be able to run it through and do exactly what I'm doing here okay so there were a few items that you need and we need this in general for the rest of the plugin as well so that's the open AI API key which would get from platform dot openai.com the Pinecone API key which we get from app.pinecone.io and you know these other things as well so we run through this we're using a pre-built data set in this example but actually if you wanted to you know do the download and transcribing yourself you just open this and you can actually see all the code that I use there so I actually built this pod GPT library to help with that is it's just easier okay so in reality there's just a few lines of code okay cool but that takes some time right downloading all these like MP3 files from the YouTube videos and then transcribing them using opening ads whisper you know it takes a while so there's also this option right so I've already done all this obviously for the lecture room and podcast so you can actually just use my data set here this is hosted on the hugging face datasets there's 499 videos in there okay but they've already been transcribed so then all you need to do is you initialize your this is your pine cone index and also the the embedding model that you're going to be using to actually create those embeddings and and sort everything in Pinecone right so that is going to be using uh open Ai and the specifically the open AI text embedding are the 002 model okay and then here come just looping through all of the podcasts we're indexing everything and that is actually it okay that's all I did in order to index all of this text then from there I just went over to digitalocean so dissolution is just a like a service where we can host apis which is pretty ideal for what we have right so we have this James Callum ask Lex plugin I basically just went over here went to create use which one apps okay clicked on apps clicked on deploy from GitHub and just deployed from this repo and then it deploys and I end up with this right so I have the ask Alexa app here you can click on there this gives you your like the URL for your app so you can you can copy that you can open it and we will probably see this okay so detail are not found right that's fine um it's just like the URL of the API without the endpoints or without any any file extensions you're going to find this okay so then I took that I went over to let's go to new chat I went to plug install I said develop your own plugin my manifest is ready so this is the we saw this the AI plug-in.json file and then I just paste it in the URL okay so from there we can see okay we've got valet manifest valid open API spec and then you can see okay we have like all the information that I that I include in there okay so I've already installed it I'm going to install it again but then from there all you do is you go to your plug install you go to unverified plugins you and you have your actual plugin here and then you you just go ahead and use it right so it's incredibly easy to put all this together obviously there are a few steps it's not that it's not the most straightforward thing in the entire world but it's definitely not the most complex and given what you can do it's it's pretty cool so now I can just ask about random things like you know I can a select what I want to know what he thinks about World War II right I know this is like a favorite talking point of him so there's probably plenty of things to to talk about that okay so yeah we get all this information we're getting like pretty similar thing to before and then we can kind of like drill down we can ask okay cool who's like this guy I have no idea so we can ask about that um like tell me more about the thingy episode this episode and it will actually be able to do this you know it's just going to refer back and we can also just use this like we don't have to use it as let's plug in every single time right so maybe and I'll let this generate and then we can ask another question maybe you know something about this it makes you think oh I kind of want to know a little more about that so maybe we can say so there's this Ian Kershaw biography on Adolf Hitler you can search that see what it is so what what do you know about the like that now I'm not sure if it's going to use the Athletics plugin okay so here it's not using the selects plugin because it doesn't need to it already probably knows some stuff about this now my my Internet isn't great so it just cut out but maybe I can try and regenerate that okay so we get a ton of information about this particular book that was kind of inspired by us reading through the Lex Friedman uh information but this in particular we didn't need that in order to to talk about this book so Chad GPT knows this and has decided okay I don't need to I don't need to ask Lex this question uh I can actually just give you all this information from you know my training memory essentially so I mean I think that is all like fascinating and it's you know it's really cool how quickly we can just build something like this it doesn't take that much now I haven't dived super deep into the like technical side of things here I showed you some of the things or I showed you some of the code I showed you the the GitHub repo I think with that you can pretty much do all of this but if you are struggling to kind of follow along or you just want a bit more technical Deep dive I do actually have another video on the on chat GPT plugins and specifically the open AI retrieval plugin that we saw and it started this video so if you like and you want to go into technical details I would definitely recommend following along with that video as I actually go through like from start to finish building a plug-in not specifically for podcasts but for basically online code dots but yeah for now that's it for this video I hope this has been interesting and insightful it's the sort of things that you can build with these track GPT plugins if you are right now looking at where to find these plugins there is right now so a waitlist to get access to those so I will make sure the waitlist link is somewhere in either the character in a video or even at the top of the video right now so you can go and sign up for that if you don't have access already for now I will leave it there thank you very much for watching the video I hope this has been helpful and I will see you again in the next one bye
Info
Channel: James Briggs
Views: 5,734
Rating: undefined out of 5
Keywords: machine learning, artificial intelligence, natural language processing, nlp, semantic search, similarity search, vector similarity search, vector search, james briggs, chatgpt, chatgpt plugin, chatgpt retrieval plugin, chatgpt python, chatgpt plus, lex fridman podcast, openai, openai chatgpt, openai chatgpt tutorial, openai chatbot tutorial, massive upgrade to chatgpt, ai chat, chat gpt ai, gpt 3 chatbot python, gpt 4 chatbot, gpt 4, openai plugins, chatgpt-retrieval-plugin
Id: bAQ6VRewf0w
Channel Id: undefined
Length: 28min 0sec (1680 seconds)
Published: Tue Apr 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.