Build overpowered AI apps with the OP stack (OpenAI + Pinecone)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay people are starting to creep on in we are going to dive and welcome everybody uh welcome to build overpowered AI apps with the op stack open Ai and Pinecone my name is Amanda Wagner I am the senior community manager here at Pinecone and we are absolutely thrilled to have you participating in this event uh this event is directly related to what we've seen you building with pine cone so we hope you walk away from this uh with actionable information and feeling inspired foreign for those of you who don't know who pine cone is granted the assumption is that the majority of you do uh Pinecone is a vector database that makes it easy to build high performance Vector search applications now before we dive into the content we do have a few housekeeping rules number one we ask that you use the chat for chat we have time at the end of every event for questions and answers so to help us out and to make sure we answer all the questions you have we ask that you place your questions in the Q a portion of the zoom if you miss something do not worry this event is being recorded and we will share it with you via email on our YouTube channel on social media you cannot possibly miss it and if you are not following us on those Outlets I highly recommend you do if you have questions after the event something pops up you can email me personally and I will operate as your inquiry liaison you can reach me at Amanda pinecone.io and we'll make sure to get those questions answered you can also reach out if you have an idea for an event or want to collaborate with us on some content now without further Ado I'm gonna send it off to James Briggs our developer Advocate at Pinecone who's going to provide some context around why the op stack is so powerful James take it away thanks Amanda um so as the Erp is like obviously it's it's a pretty new thing but I actually want to start by going back to before we had all these called Technologies and talk about like what I remember as being the the almost like the stages or the typical like day in the life of someone working with NLP so when I first started it was all like lstm which I like you know for those of you that don't know they're from a while ago they're like they kind of went out of fashion in 2017 2018 and with these models what you'd end up doing is training these models for like days in order to do anything so if you want to do some classification or you want to do a question answering first you couldn't do very well and if you wanted to do on a particular data set you would end up training your model for days in order to do that um so you need all this data and everything it would be you know it was kind of fun in its own way uh but it was definitely not efficient and the results are just nothing compared to what you know we can get now after that we ended up with transform models and uh transfer what we call transfer learning so at that point it seemed incredible what you could do is take a Model A transform model like that Google had trained on you know with literally hundreds of thousands of dollars to training these models and we could take one of those and what we could do is just add a couple of layers onto the end of it and you would fine tune it for particularly taxi it probably take an hour or so um you'd need a lot less data and the result could be way way better and that was incredible and then you have like open AI that comes along and they have initially gpt2 now GT3 and now even more recently due to 3.5 models I remember the first time I no I used it a while ago which first time I properly use it and the first time I use it with Pinecone actually was back when we were doing an event with open AI in like around summer of 2022 and it was just you know my mind was completely blown by how the performance was just incredible that you could ask these questions and it would answer the questions very accurately it would Source the place where I got this information from and you could also ask it to answer the questions in you know different styles so you could ask you to answer bullet points um you could do like conservative q a you know all these really cool things and that was you know like a a moment where I was you know incredibly impressed by how far things have got and since then things have just gotten better so that was a G3 model now you have GP 3.5 models which are way better and you also have their embedding model so opening hours embedding models they've recently kind of recently released the most recent of those which is the r032 model and the the quality of these models compared to the ones back in summer not even that long ago is just much better and much much cheaper to use which you know both of those are incredibly useful so things have gotten a lot better but even with those releases we don't get that much attention in the space so there was a fair bit uh but not that much and then we had check GPT which kind of like blew the doors like right open and then everyone is like working on this sort of thing and it's kind of crazy how much attention this place has got like I think the most recent wide feedback um like 15 or something crazy of those are using like language models the generative AI in some way or another um yeah I've been speaking to tons of people from all over um and just everyone is super excited about this there are people that are coming from like web development I've never touched machine learning before and they're building these apps uh that are like incredibly cool like you can talk through books or you can uh search through your favorite podcast you know do all these really cool things work incredibly well and you know these people never even touch any of these Technologies before um and even more recently I'm seeing you know people from completely different Industries coming in so people that are in finance have never really coded that much before may put a little bit but not that much um people coming in from Healthcare um and just all these different places like super interested in actually building things of this because it's don't probably easy to use uh incredibly powerful um but at the same time there are issues with these models now we've seen it a lot with the recent chatbot you know they kind of come up with um answers that are not always accurate right and you know you did this quite a lot and you don't you don't always get accurate answers but more importantly you don't know where the information is coming from so that's you know where Pinecone comes in as a external knowledge base so you put these two things together you have an external knowledge base with it could be any information you want in there and then you just ask these incredibly powerful language models um to sit there refer to that knowledge base and give you answers based on that or do a ton of other things so there is like an incredible number of things you can do with these and I really think that the in this level of excitement and the things that you can do with these models is far beyond why it's almost like a it's an inflection point in NLP in AI um and it's super exciting to actually see what people are building and with that actually I'm gonna hand it over to Stephen who is going to yeah he's got a really cool app um I'm sure you a lot of you find it absolutely fascinating as I said these apps that are built with open eye and Pinecone are incredible um so yeah I'll pass it over Stephen enjoy hello all right that's working uh cool thanks thanks for the intro so um I'm uh I'm filling in short notice uh to show you a cool little laugh I made um my day job is not at all in this kind of thing I make AI for drones and Earth observation and this kind of thing but I started playing around with uh gbt3 a couple of weeks ago or maybe a couple of months ago now um because a friend of mine makes web services and websites all kinds of things for um large Enterprise customers and they were looking at kind of modernizing their uh interaction stack and so I was looking into how to build some cool things on top of gpt3 and I quickly ran into the two limitations that James just talked about uh one of which is providing accurate information so knowing where your model is going to be sourcing information from and um the second one is actually the size of the data that you can inject so the the best way to provide good information to your model is doing something called context injection which I'm going to show Now by sharing my screen if this all still works there we go all right can you guys see this hands up things like this that work it looks like it works Okay so what you want to do is to inject context into a query and the way you do that is that you have to retrieve context from somewhere and you can do that with having local data in your web application for example but if you want to have a lot of data you quickly run into limitations there so you need something like a vector database like Pine code to store that information and so I'm going to show you the little app I made that's online right now it's called GPT flicks you can go there's gptflix.ai that redirects to the streamlit app and what it does is basically it's using gpt3 on the back end and gpt3 is forced to Source its Knowledge from the vector database that's on pineco so on Pinecone I have this database it's called 400k movies so what I did is I took a data set from a kaggle research competition that had 400 000 movie reviews it actually had a few more than that I appended also some plots which was another kaggle data set so lots of movies like about the story of the movie about 50 000 of those and I uploaded all of this to the vector space here on on Pinecone so I have a database of 452 000 vectors each one of these vectors has some metadata which is actually the text that I want to retrieve when I do a search so what happens here when you talk to gpdflex it's online now you can go there now don't all go there at the same time because streamlit is kind of overloaded right now but you can ask a question like what is the Men in Black about and what's going to happen there is that the uh this sends a request to Pinecone uh check so this actually this text is converted to an embedding in the format of the openai language model not not embedding is compared to the database on Pinecone the database retrieves all the relevant data and then open AI synthesizes that data uh into this response and in the the API call I'm forcing the openai API to use a setting that's called the temperature set to zero which means it can't improvise it has to use the source from the context database that I gave it right so to make that's a little clearer what we're going to do is actually really quickly and I'm going to show you how easy it is we're going to build another application so I'm just going to go through the steps really quickly um if you don't understand everything that's fine you can slow it down and watch it later on but I created another database here called GPT Wikipedia which is empty right now and what I've done is I've downloaded some data which is this Json format data which is a text dump of all of Wikipedia in English so it's relatively well formatted it's possible at least um some of it is kind of messy and some of the characters are not correct but that's okay uh we can take that and we can ingest it in a variety of this way so I've wrote some codes ingested but you can see this there's lots and lots of data all of these Json files each one of these is 40 megabytes and it's all text so I built a little script that passes these Json files and makes them into a CSV file I limited the length of it just because otherwise it'll take forever this demo and it gives us this file that's called Wiki converted so this is just a CSV file with text so it's all kinds of texts all these Wikipedia entries and for example we have uh we have that so actually let's not go to the example yet so I have built an application that's going to be using this database it's called Wikipedia DPT so Wikipedia gpt.streamlip.com that's where it lives so I can ask you questions but right now it's just a naive gpt3 it doesn't know anything it doesn't have the content of this database yet because the database is empty over here so if I ask you the question this is the question I ask because I know the content is in the database that I'm going to upload who was Pierre de OZO of Pierre de Rosa was a French architect and sculpture was alive in 17th century so gpd3 is making this up so I'm trying to force it to not make things up but even so you can't limit it completely so it's that's completely wrong if you actually look up this guy Pierre de Rosso he was a Bordeaux wine merchant in the 18th century who did a bunch of interesting stuff with wine but that's not what gpt3 thinks because it doesn't know it doesn't have that context so since we have all of these Json files and we have all this information I happen to know that one of them is about this guy we can convert this so we make it into a clean uh a clean little database clinical CSV file so more for visualization and anything else and so the CSV file the first one that I made actually it's disappeared it just contains the text content so it's just this text content and if we look in here somewhere there's text about this guy Pierre de Oz you can see it's down here so this is the actual Wikipedia article or part of the text at least the top part and what we can do very easily using the open AI stack is that we can send this text in token format to the embeddings model and we get it converted to a vector so we can find this horizontal guy in this database here and basically it might be too big it might take too long to process but we're going to find that this guy this article has an Associated Vector which is the size of X number of columns 1535 columns which is the the embedding size for openai and so since we have this text we have the associated Vector what we can do now is we can just upload this to the Pinecone database so I've already run this before so upload the panko so what's going to happen here is that we're taking the the format of this data we're taking the CSV format and it's going to start populating this database so for now there's zero vectors in here so this one is the one for gpdflex it's full with 450 000 vectors which actually almost fills up the complete collection on here uh and if we go to GPT Wikipedia you see it's starting to fill up now the the script is going through the data and it's up it's uploading all this so you can see it's really really really fast I'm adding a lot of data very very quickly to the database and all this data is then going to be indexable and queryable using gpt3 so I've got a 85 500 articles something like this so if I went through my uh my file here uh the 8500 first articles in here are already possible um already searchable in the database so this seems to have crashed I don't know whether our guy is within those articles but if he is now if we run if we run the query against here this is already searching that database so it should now return the correct answer so assuming that that particle has actually been uploaded it might not have because we might not have got there yet but we'll see in a second so what's happening now I've I've sent the request ah there we go there we go so this is correct so this data was uploaded that quickly very very easily for the database and now we have the Wikipedia article so what happened in the background there is actually when I sent that request um who was Pierre de Jose it was converted to um embeddings the embeddings were compared on the Pinecone database to all the data that's already indexed in there and then if you look here um actually the logs are probably not good these These are logs from a previous question for some reason what happens is actually the the the content of the database is prepended to the question I can show you in the code it'll make more sense but so when I ask a question um the question is formatted as your name is Wiki GPT blah blah blah and then there's a tag called content and loaded in content is the context or context rather loading in context is the content that's retrieved from the Pinecone database so it's limited by token size because there's a link to how much gpt3 can ingest but we're giving gpt3 the information that's the closest information to the content of your question and gpt3 then just needs to summarize that information in a smart way and as you can see it does it very well and so this I mean this entire app or at least the content that's in the database was literally just uploaded here in real time and we could let it run forever and populate it with absolutely everything from Wikipedia and then you have a fully searchable Wikipedia using Pinecone and open AI as a backend there you go that's my demo uh back to Amanda thank you so much Steven really helpful just as a reminder we want to get to everybody's questions so if you have them please add them to the Q a portion um and now I'm going to introduce David greshel he's a senior marketing manager at HubSpot and creator of self-service chat and textmywedding.com David you're so busy you're doing a lot uh um and yeah David why don't you show us uh what you're what you're working on awesome yeah thanks Amanda so again my name is David greshel um I'm a senior marketing manager HubSpot focus on you know the customer experience uh and chat Bots as well as create some apps on the side um so I'm here to go over how to supercharge your user self-service using open Ai and Pinecone can ever see my screen good all right perfect so what is self-service so you may be asking yourself that because it's kind of a newer term so self-service is enabling your users to find answers or questions without reaching out to your support team so this can be done via chat bot a UI in your app email automation however your customers reach out to you um and you can see here an example from canva on the left they have a nice little UI in their app let's say you get stuck doing a design you click the help button you can find answers to your questions without having to reach out to their support team as well as they show recommended questions based on we are in the app and things like that so now it's time for a demo enough of a talk so I'm gonna go ahead and put my customer hat on and show you how it works in real life here so as Amanda said I've created this app called text my wedding allows you to send and schedule a messages uh for your wedding you know schedule changes Transportation Etc so imagine I'm a user here like how do I send a message to my groups go to guest management here's my groups not sure how to send a message but here's a chat I don't want to wait for an answer so let's click help so we ask how do I send a message all right and there's my answer so sending a one-off message it gives you the answer um a nice little GIF that shows how to do it so I go over in the app and do it as well as related questions so maybe I want to send an individual message how do I do that and it shows me how to do that as well as uh more related questions and then as another demo like let's say your your apps more focus on code so here's uh one based on the next JS docs so like how do I fetch data on the client side gives you the answer here these are created by gpt3 and stored in the database which we'll go over later and as you can see here Returns the answer explanation with some code on how to do it uh so basically enabling your users to find answers to their questions so back to the presentation here you maybe ask yourself well that's all great how do I do this or how does this work so the first step is to take your documents um so in my case like text my wedding or next.js that would be knowledge based content and getting the embeddings Via open AI um how you do this doesn't matter you can have a build step that when you change your documents it gets the embeddings and it starts them in the pine cone you can create a CSV and do it manually uh whatever works easiest for you so now that we have the embeddings we need to store them so you can query them later on and this is where Pinecone comes in so you go ahead and store those embeddings in Pine Cone uh in the database and then now that we're ready to go you want to help your user self-serve you would have your UI chatbot whatever medium you use um you take the user query so what they're asking in our case like how do I send a message get the embeddings from open Ai and then go ahead and query the Pinecone database and return your different documents so maybe asking well what kind of documents can I store what should I store um and that's the world is your oyster here it's up to you so in our demo I use the knowledge base uh protects my wedding super simple same with xjs you can do videos so using open AI The Whisper models to transcribe them and store those in uh Pinecone you can store frequently asked questions um something that I'm trying now is storing queries so basically taking what the users asks uh storing those in the database query against those to get a similar match and matching it with an ID to a document in another database which I've had good success with you can put code in there as well as something else I'm trying is product tour so someone goes hey how do I send a message uh we match that in Pine Cone return a specific field in the metadata to say launch product tour and it'll take you around the app um that kind of brings you my next point then my one of my favorite parts of using Pinecone is the metadata Fields you can store a lot of different things in there so this allows you like filter content based on specific user preferences so let's say your user likes videos and you know that you're able to go ahead and filter out your content based on that um you know different locations in your app you might want different answers like on a pricing page versus a marketing page Etc so the great thing is it can all be in one database all right now your next question is all right this is all great uh I've seen you know this open Ai and pine cone everywhere but I'm not a python Dev um and that was a question I had when I first started because I know JavaScript right and I'm a marketer but it doesn't matter um you can use any language you like because there's two API calls so if you're a go rust Java python JavaScript developer doesn't matter so here are two non-production ready code examples here uh just to kind of give you a brief overview and idea so on the left here we make a call to the open AI embeddings endpoint uh we return the embeddings and then we go ahead and query Pinecone so with the pine cone endpoint you send your embeddings and you get back your your documents um to go ahead and return your users in whatever medium that you'd like all right so enough of the how why should I do this or why what's the benefit uh to me and my users so I know all of us have sat on a support phone line or on a chat waiting to be connected to someone waiting to find an answer Etc and we all know how frustrating that is so helping your user self-serve increases users user satisfaction you know if you're using an app you find your answer right away and the right answer uh you're happy and this leads to increased csat and NPS scores as well as increase retention so we know like time to value in an app is super important if a user comes in doesn't see value they bounce and find something else uh that provides what they're looking for so finding that time the value is important as well as helping retain them um if they find that value they're more likely to continue using your product and self-serve and see more value and eventually upgrade or retain and stick with your platform and then I don't know how many of us are solopreneurs here but I know me I love to build I like building features even if they don't make sense um we all do but you can spend more time doing that or marketing as we all should instead of answering repetitive support questions you know put those hours back into growing in your business and working on it instead of in it so what can you do Beyond this demo this was a pretty simple demo like a semantic search basically um there's so many things you can do so one of my favorite is clustering and something I use almost every day at HubSpot is basically taking all your user queries and grouping them into similar groups of queries to see what users are asking about most so you get a good idea of you know what do users want you can even run this through open AI to get sentiment on it so we're the spiciest or hottest friction points that my users are are running into in the different clusters you can do anomaly detection so see outliers and data recommendations so you saw in the demo here you know we were able to recommend uh different questions based on what they're asking so users can continually self-serve um you can go ahead and do generous q a I would love to show an example of that but I hit my hard limit on the open AI credits uh last night and I haven't been able to get in contact with anyone but basically feeding all the knowledge from Pinecone as shown earlier into a prompt in gpt3 and returning a more specific answer than the answers that we pre-generated uh beforehand and then classification just searching for specific labels uh or text so thank you that was my presentation if you have questions after you can email me or reach out to me on Twitter um and I will go ahead and pass it back to Amanda thank you so much uh you guys can all come on camera we have a ton of questions from the audience we're gonna do the best we possibly can to get to as many of them as possible uh and yeah so David actually let's let's start with you so Chris is wondering he says nice UI David uh what libraries and tools did you use to build the site so that is all built off of cloudflare pages uh Pinecone and openai um and well I guess react and xjs cool thank you uh Stephen going back to you Seth uh mentioned that he's building a q a that reads a site and answers questions some questions are more General like what is the best thing about your company whereas some are more specific like what is your address I've had better luck with the specific questions what are strategies for synthesizing more general answers well that's um that's the limitation of how the the context injection system really works with these models because um because you're looking at uh and when you input a query your query is converted to embeddings and so the content of the query matters a lot as to what will be the closest um in the vector space on the Pinecone database so if you can get your users to include in their question the subject matter that they want a response to it'll work a lot better like if you if you go on the GPT flicks if you include the title of a movie it works really well if you asked like what was that Will Smith movie where he was fighting aliens in a black suit that probably won't work unless the the content of your database where you have the possible responses contains a lot of those keywords so in the end you're looking in word space so to speak so if you have people asking very very general questions one technique that can work and something I've been playing with I I don't have in this demo here is to actually we have metadata that you add yourself to each of your text contents so for example if you have people asking questions about your business and some of those questions are for example related to a specific type of Technology all of the branches all of the contents related to that technology in your database even if those pieces of text contain the name of the tech you could add it before you input this this data in the database and before you calculate the embeddings that way you'll have that proximity added somewhat artificially but it'll be in the data and then also on panko on the API I I'm not an expert but from what I've seen you can retrieve the top 100 closest Pieces of data so that also gives you actually quite a lot of scope so the way I'm doing it in the summer is very naive I'm taking just the top pieces of data until I run out of context space but you could be a lot smarter about that you could actually pass the top 100 pieces of data and you could build a prompt that the user doesn't have to see on the back end that's really looking through all of that data and finding the most relevant answer so there are many many ways to build that out but you're kind of getting into the complexity of how language Works beyond the the proximity in the database at that point so the answer is you can do it it's just there are many ways to do it okay thank you super helpful um James this one I'm gonna point in your direction but if you guys have some input here feel free to just chime on in uh zaheed says I'm pretty new to Vector databases what kind of metadata is appropriate to be storing in Pinecone versus storing data elsewhere SQL and storing an ID in Pinecone I'll mute myself um yeah of course um so this kind of depends on your use case and and what you want to sorry behind going um there are some limits so if your metadata is like a string insert float you can store it [Music] um if your total metadata per Vector is more than 10 kilobytes then um that's too much so you need to trim it down so as long as you don't go beyond nodes affected you're okay and you can store it in pipeco yeah what we tend to see in most cases that people either just put everything in there because it's easier or what people do if they need to be a bit more selective about it is they will store like say you have hex data like a big chunk of tech um you can't really do much with that when it's in Pine Cone it's just kind of being solid there so what people would tend to install that externally and then if you have a thing like metadata like dates or like document categories or something along those lines then you can store that in Pinecone and what you can do with that is something called metadata filtering where you can like say you only want to search for documents that are from the HR department for example um you would be able to do that you just add HR to your filter and it will do that or if you just want to search for a reset document you can do that you just kind of say like anything that's greater than the timestamp so yeah it depends on what you undo you can store a lot of stuff in there but realistically most of what you were sorry now you probably want to use for like metadata filtering or something along those lines depends on what you're doing cool thank you and this is this is kind of for anybody any tips on chunking Data before inserting into pine cones database have you seen any approaches that work better than others yeah go ahead sorry James Hugo okay I'll go very quick one my go-to uh with this is to trunk it depends on how much like date what model you're using for solids um also how much data do you want like you you need in order for the pet to be meaningful so basically you want it to be the chunk size you need to be large enough but wherever you're capturing that is Meaningful um but you don't necessarily go too large because when you're putting that into your generation model whatever it's going to cost you more you know it's perfect all of that um but that being said there are benefits to increasing that um the one thing that I would add to that is that it's usually good idea to add some overlap between your churn so let's say you you shrunk let's say five paragraphs at once uh just include like one or two sentences overlap between the chunks so that you're not missing any like potentially connecting information there that you you might otherwise like cut in the middle of um but beyond that oh and also try and Chunk on like new line characters or spaces if you really have to um but beyond that I think that's kind of like rules from otherwise it depends on your your data yeah but that's why I was getting it so yeah I was being so exactly the same thing so it keeps the mobile app so you're not losing uh context and um the other thing is the the database itself on Pine Cone I mean compared to what you would want for most consumer applications like answering questions about a store or something like this you have loads and loads of space so from my own tests it often makes sense to to make smaller chunks with more overlap and when you retrieve the closest content just keep a wider window so you're taking let's say 20 pieces of context instead of taking the top two um because then you'll you'll get I mean you can get all of the information that way and you won't fill up the database very quickly unless you upload all Wikipedia which then you will fill up the database I can tell you that foreign this one's for you are there any other metrics which you can use for similarity other than cosine similarity for word embeddings I haven't explored that at the moment I mean on Pinecone I see there's three different ways you can calculate proximity for the vectors I don't think there's anything that would make sense really with the the way the embeddings are built from open AI in this context so I don't know but I doubt it would be really useful to do it in another way right now cool and and then Jacob Lee has a question I'm not 100 on who this is geared towards also for those of you participating feel free to add your own answers to these questions in the chat uh Jacob Lee says really amazing demo was wondering if there is a GUI way of viewing or editing what is being stored in the vector database attaching a second question to that was curious if there's a way to search through all namespaces within an index foreign specifically through a single namespace uh what you know if you want to do that a good way to do so is use metadata filtering and basically when you're doing those searches where you want to uh essentially go into one name basically just filter for a particular item but then obviously you've got everything in a single namespace and then for the other question um what was the other question again Amanda uh was curious if there is a way to search through all namespaces within index no no the one before oh just kidding just kidding uh was wondering if there is a GUI way of viewing or editing what is being stored in the vector database yeah I mean you can you can go through it's limited but you have the Pinecone console um so you can kind of view everything in there and do a little bit in there but you you you would need to use the client um or the rest API or something in order to do like everything okay sticking on the whole uh metadata like William Moore wants to know uh would like to know more about the use of metadata the idea would be to have a context for a user input for my use case I would like to answer a question differently if the user is a current client or not is this where metadata is used and how uh so I would say no uh metadata would be more like specific to your data uh rather than specific to whoever is answering or asking the question uh in that case if it's someone who is you know you want to answer the question in a slightly different way um you would modify the front that you're feeding into the large language model or I just thought um maybe this is more aligned with what you were thinking um if you are wanting to search a different data set based on whether this is a um like a logged in user or not in that case then yes you would use metadata filtering so you would um you had like a client field or something like that in Pinecone that said yes or no and if they're logged in it's yes and it will just filter for those um items in your in your vets database foreign color no I mean you could just have a separate namespace for logged in and users but uh yeah I mean you could you can filter it if you want to and it's there's many ways you could solve that problem really Aaron Dunn wants to know what guidance can you provide on spitting up the text before creating the embeddings how many words per embedding Vector it is generally preferable to maximum the size of the input text that goes into each vector or minimize it I'm a little confused by that question I I'm I I guess the question is do you want to vectorize do you want to use the maximum possible size of context before you vectorize the content the reality is that the maximum context it depends on the model you're using obviously you can use this with all of the gpt3 back end I mean with many other things but that's what we're talking about here um the most people are building on DaVinci zero zero three right now which is the biggest model and it has a context length of 4190 tokens or something like this which is a lot of words right I mean um that that's a couple of pages of text and the reality is that generally if you're just trying to retrieve one Salient point of information about something precise that's far too much context so in reality when you're building an app for this um it would be quite rare that you would fill that before converting something to embeddings like you'll want to chunk that down into smaller pieces in reality so you don't really encounter that question so this will be our second to last question I know know there's so many questions out there but I'll make sure to get it to our speakers and follow up with y'all but how much time does it take to retrieve the answer from the vector DB I imagine it's based on the number of vectors I have some documents I want to embed and upload to Pinecone what approach would you recommend I think you guys are the experts James maybe this is all you this is all you yeah sorry can you just repeat I was just reading the comments there's so much you guys are very active today we appreciate it retrieve the answer from the vector I imagine it's based on the number of vectors I have some documents I want to embed and upload to Pinecone what approach would you recommend right um so so it's very fat um it depends basically most of the wait time Insurance Network latency so depends on where you are located um so if you're using like Google collab in a similar region to where your Pinecone instances you're going to be waiting like um like 30 milliseconds um like around that and as you just like a typical number that will increase over time as you add more vectors but then it decreases once you add more parts um so it kind of it will go up slightly over time as you add like literally tens of millions of vectors but even if you have like a billion vectors it's still going to be pretty quick so not too much of an issue yeah for practical purposes the GPT clicks up has 450 000 factors and I mean the actual time to retrieve the the top K most uh closest vectors is instant I mean it's just the query to the to the rest API that takes some time like I can't measure that I'm trying to see if it takes any time and it seems the response is instant for my purposes cool and so we'll close on this question this is this is from me uh you know you guys Stephen and David have maybe unconventional backgrounds and so when it comes to somebody who's trying to build with the op stack you know are there any resources or uh things that you consistently turn to for Aid and and your building process David how I see you nodding your head yeah I think so how I've found pine cone and and things like that was actually through James's YouTube um there's a lot of good awesome tutorials on there and a lot of it too is just not being afraid to go out and test it and try it and learning from it because a lot it's not not everything's so straightforward with AI you get some interesting results and weird things you just try a bunch of different things um and then Twitter as well there's a lot of good good content people testing things on there reading good ideas yeah yeah exactly same thing just try things that there's lots of resources on Twitter um another thing is you can look up line chain there's a whole Community around Lang chain I've done some cool stuff um GPT index did some cool stuff so both of those you can look them up on GitHub and the open AI cookbook is really good it's the most popular Reaper on GitHub has been for the last few weeks they have some really great resources in that some things don't work but they're fixing it really quickly so they're doing a pretty good job I'm also going to open source this little app with streamlits as soon as I have the time to clean up the code and get rid of my API keys and stuff in there so hopefully tomorrow I'll open source that as well if you wanted to go around and see how that works so there you go very cool James I think it's fun that you were uh quoted as one of the resources what is the resources resources I don't want mine um I don't know the the end just tell them you look everything up on chat GPT actually no please don't please don't tell them that funny thing about that Gypsy beautiful I will say that it is yeah I've used it to pass some obnoxious Json files and things like that it's been quite handy cool well thank you all for participating I know there's a lot of questions out there we will follow up um and share this with everyone this video will be available again this week uh on our YouTube we'll share via our social media and we're really happy that you guys were all able to join us and now one thing I always miss from in-person events I think we deserve to give ourselves a round of applause so thank you everyone hope you join us again have a good one thanks so much
Info
Channel: Pinecone
Views: 12,671
Rating: undefined out of 5
Keywords:
Id: -dZrNj2mVHo
Channel Id: undefined
Length: 48min 40sec (2920 seconds)
Published: Tue Feb 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.