What is a vector database? Why are they critical infrastructure for #ai #applications?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
awesome all right well hello everybody thanks so much for taking the time to be with me today uh my name is Zachary prer or Zach and I'm a developer Advocate at Pine Cone so today's talk is going to be uh what is a vector database and I'm very excited to be talking with all of you about what is a vector database and then I'll also share that my background is an application developer and I've also done so I've done front-end backend stuff open source Development I've done a ton of infrastructure stuff and so this is all interesting for me because I'm coming to Pine Cone now four months in and and I've been able to take great advantage of um Vector databases and the patterns that we're going to be talking about despite the fact that I do not have a deep background in machine learning so I thought that might be interesting to folks here I'm sure there's plenty of people on this call that know a lot more than I do um but in this talk I want to cover what is a vector database I like to do that first by saying you know talking through what kinds of problems do the vector database actually solve for us what kinds of applications can we build using a vector database that perhaps we couldn't have before that was harder and then the part that I'm most excited about is going to be some of the architectures and patterns and applications types that we can actually build using uh Vector databases so I will also do my best to leave some time for us all to do Q&A at the end and then anything that I don't know off the top of my head I'll be sure to bring back to the broader pine cone team and and make sure that we get your questions answered during a followup so we're going to go through a technical uh exploration of what a vector database is uh and how AI applications think and the critical role that Vector databases play in AI applications specifically but first I thought it'd be nice to kind of ground our discussion in a real world example so in this slide you can see three natural language ambiguous English user queries that you might expect to get in a given application they might send it into the front end and so we can see that just like you know English is very ambiguous it's famous for being ambiguous there's the first query is where is the Bank of England so that's discussing a financial institution whereas the grassy Banks that's like a landscape feature like a hill and then how does a plane bank or turn um you know gently while it's flying and so in order to kind of ground our discussion about how uh Vector databases and AI applications in particular work I thought it would be interesting as a thought experiment I'll first invite you to imagine that all we have is naive text-based keyword matching search and we need to build an application that has to handle these three user queries and any other query that the user might send in and so on our back end you can imagine some code that receives the user's query chops it up into distinct words and looks for the keywords right and then once it finds keywords it crawls through the whole database that we have and and returns any documents or rows where we might have the keyword bank and so if you just kind of think and imagine and perhaps we've all even built some of these systems before or use them in the past uh what kind of performance or experience can we expect for our user if we're doing keyword searches on natural language queries that are ambiguous we're not controlling what the user can actually say um and we can imagine that you know as we've all experienced like the just keyword matching alone is not really going to provide excellent performance or a high quality experience for our end users because we can't differentiate between a bank of England like a place that holds money or a plane making a soft turn while it's flying um so in this talk we're going to look at what a vector database is but I like to as I said first understand what are the types of problems that they solve in particular and so in order to do that let's just take a step back and imagine how does AI think and when we're talking about thinking in the context of AI what we're really talking about is this internal representation that the AI models have of the data that we're feeding them or the data that they're trained on or the data that they're looking at when they're servicing a user query if you will and so in order to understand a AI we can borrow some metaphors from what we know about human intelligence right and I'm sure you've all heard of neural networks and so many different types of AI learn using neural networks which are interconnected layers of neurons and one way to think about those neurons or layers of neurons is a uh sort of stage of mathematical functions that have their own weights and as we're feeding these neural networks immense amounts of data uh they are continuously learning or being trained and these functions are updating their internal weights or their parameters if you will and so what happens over time is uh and here's like just a very simple diagram but what happens over time as these functions continue to be exposed to more and more data as we're feeding them uh during our training period they get to a point where they begin to extract highlevel features and the features are really uh Fe features are think about what makes the thing the essence of the thing so if we're thinking about a cat and eventually we TR an image model to be able to say that's a cat right like what are the the features that make up cats so it could be like the sopur particular type of eyes claws the the size in relation to other animals Etc and so what's important to understand is that you know eventually we as humans even in today's world come to feel that these AI models have an understanding um of the data that they're being you know presented with and what's what's really important to understand for this talk in Vector databases in general is that the intern representation of the AI model for these features is done with vectors or embeddings so let's again ground this in a particular example um with apologies to anybody on the call who might have a fear of clowns I promise there will be no jump scares whatsoever but imagine that we are training a neural network on millions of images of clowns and we want to eventually get to a point where our AI model can recognize this is a clown or this is not a clown it can do discriminatory work in that fashion so this first image is about you know we see some clowns and they're playing in a park and they're doing different things holding a balloon pointing maybe telling jokes uh and then we continuously feed our neural network and and model with more and more data and images of clowns perhaps we also see clowns performing tricks they're performing at birthday parties and eventually we get to a point where the AI model extracts the highle features from the data right it's seen so many clowns and it's it's updating its weights in those in the the very in the functional lay of the neural network those neurons and eventually it it comes to determine that there are certain highle features which represent clown and importantly these are things that even we as humans would agree represent clown if you will or a clown right so it might be the very colorful uniform it could be the uh frizzy hair wig it could be a red balloon or the big floppy shoes uh maybe the red nose but we get to a sense of uh the same way that humans kind of think okay that is or is not clown because of these things the model extracts highle features and the way that the model represents those features internally is through vectors or embeddings and so that begs the natural question what are embeddings and vectors right so first uh the important thing to know is that they are synonyms they're the same things we use them interchangeably when we're talking about them um and at a high level this diagram demonstrates how you get embeddings and so the technical definition is that embeddings are numerical representations that the essential features and importantly not just the features of the data but the relationships between discrete objects right so you know clown large clown and small clown they're related because they're both clowns even though they're different um and so this could be words documents and it's represented in a continuous Vector space and we'll look at what that means uh a little bit more concretely in a second but the really important thing to understand here is that we see you know there's object one object two object three this is data in the abstract sense being sent into an embedding model the embedding model extracts those features that it sees and the way that it represents those features is via embeddings or vectors which are lists of floating Point numbers um and the really important thing to understand and what's very cool about this and what kind of represents sort of a paradigm shift away from traditional database modeling if you will is that this can be any type of data it doesn't have to just be text so we're also talking about uh images we're talking about being able to do comparison search across image space we're talking about audio being able to match things and voices and people and discriminate that this is a noise of somebody dropping something and breaking a cup versus this is somebody singing a famous song um we can do it with video Etc and so uh this becomes extremely powerful but the other thing to note is that there are many different types of embedding models may already be familiar with some one of the more popular ones for example is open AIS text embedding ada2 um I'm sure a bunch of us have used it there's open source and closed Source embedding models but at a high level when we're talking about how we get embeddings we feed data of any kind into an embedding model and outcome the vectors or the embeddings which represent those features and relationships let's try and visualize this a little bit more so if we look into um the high dimensional Vector space this is a visual representation of that and imagine that this is like a pine cone database that you've already stood up and you've stored a bunch of vectors in here so when we talk about uh interacting with a database we talk about upserting the vectors that really just means inserting them putting them in the database um such as pine cones and then the way that Vector databases work at a high level is to group together vectors that are similar and have features that make them similar and to group apart those vectors and pieces of data that are dissimilar from each other and so here we can see in the high dimensional Vector space that uh there's a representation in the database itself that these are not just names they're not just phrases right you might have seen in a a western movie this is Friday so somebody names their horse Friday it's not just a name these are all days of the week and they have that feature and therefore they are grouped together more closely as Neighbors in the high dimensional Vector space whereas these data points represented in blue could be a dog this could be an airplane this could be a mathematical function they're dissimilar and the vector database is representing that as well so not just the features that make them uh unique but also the relationships between them and that is a function of how closely together they are represented in Vector space or not closely together so now that we've got this understanding of vectors and embeddings and why they're interesting let's return to our original example where we were previously using naive keyword search matching right and now again we can consider in the context of vectors these same three natural language queries from our users which are in ambiguous ambiguous English right where is the Bank of England again talking about a place that stores money safely where is the grassy Bank a landscape feature that's further apart because they're dissimilar the semantic meaning of the sentences is actually dissimilar and then how does a plane bank or turn in Flight is even more dissimilar from uh financial institutions so they're spaced far apart and this is really the essence of what gets to allowing us to build applications that seem incredibly intelligent and that are able to determine exactly what we're talking about and find highly relevant search results so uh notice also at the bottom of the slide now we have two new sentences the first sentence is the bees decided to have a mutiny against their Queen the second sentence in completely different keywords and syntax flying stinging insects rebelled in opposition to The Matriarch these resulting vectors after we passed them through through the embedding model are quite similar they're almost identical in meaning they are identical in meaning but they use different uh words to convey that meaning and so we would expect them to be very close together in Vector space and so this is to show that you know the power of vectorizing your data is that you can model the relationships at a very granular level and in addition to the features that we talked about with the neural networks so now that we understand why vectors are powerful let's take a look at Vector databases and how they work so a vector database Loosely defined is a database that allows us to manage embeddings at scale and importantly some Vector databases such as pine cones allow you to not only insert and update and delete vectors but also update the index which holds those vectors instantaneously so there's not if you've worked as again folks that are more even more familiar with ML and and Big Data than than I am on the call if you've worked with uh architectures before where you need to wait because you're rebuilding the index uh that's not an issue with the the vector database the performance is uh instantaneously up updating the index and there are certain places and use cases you can imagine where that's incredibly powerful such as e-commerce right you might want to ensure that your users talking to your chatbot are able to get the latest uh products and that they're able to see the inventory immediately so as you sell out of this famous Nike shoe immediately the the index is updated to say there's none left um if you add or do a new product launch immediately th that class of products is available within Vector space for anything that's quaring it to search it so the other thing to understand that makes uh Vector databases so powerful is that once your vectors are ups certed and they're in the high dimensional Vector space you are able to query the vector database and importantly you give it a query vector and so you say and we'll talk about concretely what a query Vector looks like but you say give me all of the nearest neighbors or the most similar vectors given my query vector and then the database very quickly returns to you all of the nearest neighbors uh and the most similar based on those features we talked about that are represented in the list of numbers of the embeddings and then finally it's important to understand that some Vector databases pine cone included allow you to attach arbitrary metadata um if you're familiar with JavaScript think of metadata as like an arbitrary object with any fields that you want to attach and when you upsert your V which could be a chunk of text could be dialogue you have the option of including whatever metadata you want so let's imagine uh you're building a chatbot that is representing a television show character and you want that to answer you want that chatbot to answer as if it is that character well the metadata that you might associate with a vector the vector could be say a plot summary of a given episode in the metadata you might want to say this is the season number this is the episode number this is the runtime length of that episode This is the the author of that episode and the point of metadata is so that when you query back the nearest vectors into your application you can use that metadata however you wish in order to enrich your ux or your experience within the app uh for your end user so metadata is the bridge from ambiguous language as we've seen going into vectors to structured data that you assign to your vectors and then a quick mention of what about pine cone in particular is interesting or different it's it's a you know crowded Market there's as you all have seen there's tons of different Vector databases uh companies that traditionally haven't been Vector databases are are creating and adding their own to open source projects Etc so the thing about pine cone That's Unique is that we were uh designed from the ground up to be fully managed and run in the cloud and so that means we run in AWS we run in Google Cloud we run in Azure whichever Cloud you prefer but you don't need to worry about patching or security updates or oh there's this engine issue and I need to go upgrade my database because our engineering team is handling that for you um behind the scenes you can create manage and scale indexes all through the API or the console if you prefer you can even I just saw a cool update where you can go in the console and in the existing index you can just drag and drop a CSV file if you want and get your vectors in that way um and then we handle scalable ingestion so we can handle hundreds of millions and even billions of vectors and even when we're talking about billions of vectors the amount of time it takes to query across that Vector space and return you highly relevant results is sub 100 milliseconds and so that's really important because if you're building an AI application or generative AI application like we're going to look at some examples in a second um you don't want your backend especially your your vector database to add any noticeable latency or lag to your UI and so you're able to get back highly relevant search results even when you're dealing with hundreds of millions of billions of vectors in milliseconds which is very important and then finally I'll just mention that you know as I said for I'm an application developer I've I've used all type every type of this database uh that's represented here as I'm sure you have so there's key value stores like redis there's document stores like mongodb there's there's graph databases like neo4j Etc and then there's Vector databases and and we feel that Vector databases need to be purpos built from the ground up because they require specialized indexing query optimizers and also special storage in order to organize and retrieve data efficiently it's not just about speed it's also about ensuring that the relevance of what you get back when you query is extremely high so with that out of the way now get to the part that I'm most excited about as an application developer what can we actually build with Vector databases and what types of applications can we make that we weren't able to make before um so first let's imagine uh again we've got the same query from a user where is the Bank of England and this architecture I'm showing you here is called semantic search and it is sort of of the foundational architecture that you're going to see repeated again and again in more advanced applications and architecture so you know once we understand this part you can pretty much go anywhere with Vector databases that you want and essentially semantic search is the way that we ensure that we're always returning accurate results for those same ambiguous user queries we looked at in the beginning so unlike naive keyword matching what semantic search does is takes the user's query here converts it into a vector by going through this same embedding model that we actually put our content or our data in what comes out is the query Vector so this is just like with our content this is a vector that has features that are represented as floating Point numbers in an array and we send that through the pine cone API and we ask the vector database search all of vector space and find me the top three or the top five whatever you want top thousand nearest neighbors and return those in the query to me and this is incredibly powerful as we'll see in a second but again you can attach whatever metadata you want during that upsert call and that's what we're seeing here in the vector database where we've got metadata attached to each Vector so you're not just getting back your vectors that are the nearest neighbors you're also getting back whatever arbitrary metadata you defined and now these highly relevant results are going back to the application you can imagine that your application is also has an API key and it's perhaps querying or calling chat GPT and open Ai and now it has the highly relevant query results that match the user's actual semantic meaning and this is what is the the foundational shift here like we are actually looking at what the user means at a semantic level when they query our system with a semantic search architecture and that's what's enabled by a vector database such as pine cone um and this is what allows us to build the really intelligent AI chatbots that you've seen um let's now look at this in the context of this is a very hot topic we see lots lots of interest in this right now um so llms or large language models as you're aware such as chat GPT suffer from a problem called hallucination so let's just take a quick detour and also understand how that same architecture we just looked at can solve this really thorny problem of llm hallucination and what it means so llms like chat GPT for example and specifically if you've ever used it you know that eventually it'll say something to you like uh I'm an AI language model I don't know about current events because my training cut off date was September of 2021 so therefore anything that's happened since then I I haven't been able to see it I don't know right and you know eventually open a needed to kind of Define a cutoff Point whether it was either because of the cost of training or they wanted to get the model out to the market but these are the same problems that everybody building foundational models as we call like chat gbt um are going to face and So eventually you need to pick like this is the end of the data this is all the data we scrape from the internet these are all the books we want to feed it this is all the conversation and we need to stop and so if you just query the base llm if you just query chat gpt's model alone and say how many customers do we have in New York City you might get a response back that says New York City has 88.4 million potential customers as of 2021 what's interesting about that response is grammatically it's perfect syntactically it seems sound it's uh un unfortunately factually completely incorrect in a fabrication because New York has 8.5 million people living in it so there can't be almost 10x that number of potential customers the worst thing about this uh and the sneakiest part of it is because of llms are defi are are built and designed from the ground up they do not understand that they are hallucinating and so they will give you back this answer very happily and let you run with it and now let's examine why uh in the real world if you're actually producing non-trivial applications with generative AI or exposing chat Bots for support even that can really become a thorny issue and it can actually cause not just brand damage to your company but it can also uh potentially cause harm in the real world so let's imagine a different scenario now we have a user who has purchased a Volvo car and they're on the Volvo support website that maybe has a chatbot and their question is how do I turn off the automatic reverse breaking on this particular Volvo the Volvo XC60 right if we send that query directly to the Genera AI model just like T typing into chat GPT and chat. open.com it might come back with something that says you know it looks completely grammatically correct it looks syntactically plausible again and it's it's leading this user down a complete rabbit hole because it's hallucinating right so everything it says seems reasonable press the settings button go to driver assistance settings select park assist look for the option to turn off automatic braking but unfortunately because uh the foundation model of chat gbt has never been trained on Volvo's actual user model uh user manuals right or any of its actual instructions for vehicles it's literally fabricating this out of thin air based off of what sounds linguistically most correct and so that again we get a plausible sounding but total hallucination which in the context of trying to turn off a safety feature in your car could actually cause real world harm or be dangerous so let's now look at a a architecture a pattern that we're seeing a tremendous amount of interest in you may have also seen this on the internet already it's called retrieval augmented generation uh the acronym is Rag and the reason that this is uh really interesting it's not the only way to fix the hallucinations problem but it in our opinion it's the most cost effective way and it's also the most uh reasonable and tenable way because as I shared in the beginning of the call my background is in application development and I'm an infrastructure automation person so I don't have ml training but I am very comfortable with programming languages front and backend and learning uh to use code and so for example uh I took a three- week break before I started at Pine con a couple months ago and I built an AI bot that was uh impersonating Michael Scott from the television show The Office using retrieval augmented generation and after looking at an example I was able to implement that myself in you know Python and and JavaScript without much trouble and so this is really powerful because companies tend to have more application developer expertise than necessarily machine learning and data science expertise it certainly helps but the reason why retrieval augmented generation is powerful is that you don't you you you can have your application developers help you with it essentially so in retrieval augmented generation let's look at the architecture there's really two phases the first phase this would be us as developers Engineers machine learning experts working together preparing our data store or database before launching a chatbot for example and so we want to train this um we want to create a cont contextually specific database to contain all of the information from our company so we might read URLs from our own documentation we might read our our API documentation we might have PDFs full of our service offerings um manuals that explain how different products actually work and we can read all those URLs in code we can chunk them into documents which is just a pattern of making them a little bit smaller pass them to the embedding model and just as before we get out we get out Vector embeddings right we get out the the list of features as floating Point numbers we can optionally attach metadata we could say this is for the product like this the shoe is orange the price is $99 whatever we want and then that goes through the API and at the end we have a fully populated Vector database that contains context specific information about our company our proprietary data what we're actually doing the subject matter so when I built this for myself for fun this Vector database that I used was completely populated with human written summaries of every episode in the office and so then my chatbot is able to refer to that and the way that the chatbot uh or generative AI application is able to refer to this is defined in this second phase or this rag architecture here and if you're if you're paying attention you'll notice that this is quite similar to semantic search and that's because it uses semantic search under the hood right so the essence is the same but the results are are pretty gamechanging so again the user asked an ambiguous natural language question we convert that into a query Vector it goes through the pine cone API but now what Pine cone's database is returning is the actual results that we trained the database on for our use case so it could be uh you know all kinds of facts about the bank of England how much money it has on on hand in a given day it's history where it was founded like if we were just making a bot that wanted to be an expert on on the bank of England we would have populated this database with all of those documents which means whenever a user asks a question to our end application or our chatbot or a AI application we are actually looking at their semantic meaning and we are returning our own documents that are the most semantically similar to what they asked us about and this is incredibly powerful let's look at this in the context now that same example with retrieval augmented generation the user's asking how do I turn off automatic reverse breaking on this Volvo XC60 so the big idea is that the large language model at the center needs access to a data store of context and real world facts so imagine now that we're working as engineers and data science experts at Volvo we could take all of the Volvo users manuals we could convert them to you know whatever format markdown or PDF they go through the embedding model they get uh they come out as embeddings and vectors as we've seen they go into the vector database now this Vector database is absolutely replete with factual information about how every model in the Volo lineup actually works how you actually interact with its different features how you turn settings on and off what the specs are for the car Etc and you can imagine here the metadata that we attach to our and our vectors as we're upserting them might be make model year color uh whether or not it's electric vehicle true or false right these types of things it's whatever you want to Define so now the world looks with just a couple of differences in the query path looks quite different for this user because our user asked the exact same question how do I turn off automatic reverse breaking on the Volvo XC60 we put their query on the application code whatever our backend language is through the same embedding model we get out the query Vector as we've seen before that query Vector now goes to the same Vector database that we've made into this repository of Volvo user manuals and what comes out finally for the llm whether it's chat gbg4 or any other is not just the user's query but also in natural language but also the context so if they are asking about the Volvo xt60 and they're asking about reverse breaking this context is going to include this user is asking about Volvo XC60 here's all of the manuals or here's all the documents about Volvo XC60 and automatic reverse breaking so now chat GPT with all of this information please answer this user's question and this is where the game changes completely in terms of relevance and quality of response because now the llm is able to look at factual data is still smart enough to answer the user question but it's actually grounded in the truth that we've added to the vector database so the driver can choose to deactivate Auto with rear Auto break and cross traffic alert the warning signal can be deactivated separately activate or deactivate the auto break with this button in parking camera view so this is the factual answer is completely different from the previous hallucination and this retrieval augmented generation pattern or architecture has allowed us to fix the problem of hallucinations in llms within our generative AI application that is what makes it so powerful so um getting up to time here I want to leave some time for questions uh there I want to make it clear that generative AI is exploding as we all know it's it's super exciting there's defin a tremendous amount of of awesome things that you can build that are a lot of fun that are making money that there's you know tremendous forecasts of just generative AI alone what it can do for businesses but that's a small part of what Vector databases enable right and so as I mentioned before Vector databases uh also enable semantic and similarity search as we saw not just for text but also images audio video objects within video frames uh we saw how a Vector database can help ground chat Bots and support bots in proprietary or context specific data that you might have for your company we saw how it fixes hallucinations via retrieval augmented generation and then that's just a few um potential use cases I can I can also share with you that um if anyone's interested I'll I'll share it later in this call or I can screen share and show you we've got tons of jupyter notebook examples in GitHub that demonstrate how to do different patterns all using a vector database and you'd be surprised at the range of applications represented um by those Jupiter notebooks and then we also have for the typescript folks and for JavaScript developers and application developers on the call we've got uh versell templates that uh Implement retrieval augmented generation that are easy to launch and that demonstrate rag from a code perspective um and so but again this is just a small piece of what you can actually do with Vector databases the the true power is being able to get at the semantic actual meaning of what what a user's query is and then find them the most uh similar results and the the level of relevance you can achieve doing that using a vector database at scale is pretty phenomenal and that's what enables these these hyper intelligent or super smart chatbot experiences that we're all kind of starting to see crop up now all right and with that I will close by saying uh thank you all so much for your time and attention thank you all for being here with me today and allowing me to come and speak with you really appreciate it um I hope this was useful for you um feedback is also welcome about you know how we could make this better for you in the future and I will now also open it up to any questions and if it's not something I know off the top of my head then uh this is being recorded and transcribed and I I promise that I will go back to the broader pine cone team and make sure that we get your question answered in a timely fashion all right thanks so much um yeah and looks like evites if I'm saying that correctly you've got your hand raised yep go for it yep hey okay so I I can see like the primary advantage of using the rag architecture with the with the vector database would be we don't need to retrain our large language model right I can still use my same base model whatever like you know bird GPT L to whatever we have but this this but to is there like a limit to what extent like can we survive without going back to like fixing or like you know retraining my original my base llm and just using the vector database yeah is is there a limit to that or like you know it is able to handle anything yeah that's a really excellent question you know I thought about I should probably included a slide there to show and we have slides and other presentations about I want to make it clear that um as you pointed out correctly like rag is not the only way um you could go back to your base model just saying and fine tune it for a different example but even that requires you know um mlx expertise and data scientists that you may or may not have on staff um and then uh rag just happens to be what we find to be the most cost effective way because it's also quick to implement from an application Level so to answer your question about when does it start to kind of show its wear and tear from a scale perspective with with pine cone you know that's not really an issue because um you know again we can handle hundreds of millions or billions of vectors very quickly I think it is an interesting question I don't know the answer myself perfectly yet about where we're going to see this go um there's been lots of discussion on social media you've probably seen it where some people are claiming hey rag is a hack right uh it's really addressing it's really addressing the fact that you know the context window for these llms uh is limited um and so at what point do the like the open AIS of the world and other models and open source that's catching up so fast or maybe in some ways leading the way fix their context windows and make it so that you know rag becomes less L and less useful that's entirely possible I mean as as you all know it's like the entire space is is changing so quickly the the other thing I'll mention is that normally when we have the slide that talks about here are your options in terms of like difficulty and speed to fix hallucinations the first is also prompt engineering right you can get so far with prompt engineering and you can say hey uh in the future don't say that weird thing you said and then if you need you know this and please browse the internet for the for the user Etc but as we also see on Hacker News every day um prompt engineering is not perfectly reliable you can Jailbreak llm super easily by just asking it questions even the latest I was playing with the latest uh llm like Dolly 3 integration for open Ai and it's incredible it's super fun but if you put certain text in an image it'll just do what you ask it to and drop the whole prompt right so it's like not even prompt engineering is perfect and it'll only get you so far and it also can't help with Dynamic uh like context injection so imagine it's like you've got the entire library of like classical literature in your pine cone database or some other Vector database rag is still going to be more effective at actually finding the exact results the user is asking for and then immediately returning them but you raised an excellent question about like how long is this going to be the case I don't personally know it will it always be needed um I also don't know that um I I just know that right now it works pretty well and that I can actually say from personal experience having having implemented it and um terms of the actual scale on the back end there there doesn't appear to be any meaningful limit yet that I've seen in terms of just the the vector side of things um okay because I think having having like that that limit or an understanding of what the limitations are would at least prompt like any organization to say okay hey we are reaching this limit maybe it's now time to fine-tune my underlying model I cannot just r on my rag anymore that's an excellent point yes uh thank you and I will also I think I'll update the slide now with that in the future because you're absolutely correct there are certain times and places where rag might be appropriate like if you're just if you're doing a chatbot or support bot it's excellent right it's super quick you could have an application developer fix fix it in two days and and there you go um but then when you're really dealing with with high scale or very specific use cases where uh you're trying to do something that only your company knows like maybe fine tun and you also have the the expertise on staff perhaps fine tuning is better it just you know takes longer and there's a cost to it for sure and then of course you can go as far as saying we're going to build our own foundational model um like chat GPT but but then it's also that that's also cost prohibitive for most organizations right I think Sam Alman of open AI shared that uh training chat GPT estimated it was like over a 100 million in just compute costs alone before you right yeah I mean there aren't that many like at least in finance there's like Bloomberg GPT again like you know really big organizations with a lot of funding can't afford to do those number of other organizations and in terms of like cost right um like you know is there like if I'm comparing okay designing this Vector database um and using that versus like you know fine tuning is there like a cost comparison we can say on like you know either building this Spector database up plus uh ongoing uh cost to maintain it versus like you know fine tuning on a regular basis do you have any any idea of uh in terms of likeing a cost estimate for that that is an excellent question I can tell you that internally I think I can sh share this that internally we're working on some tooling to make it easier to understand what your cost would be we're we're also looking at some tooling that might help you explore or play around with different scenarios and approaches that do or don't use the the vector DB and do or don't use certain models even um I think that's a great question I think it also comes down to again like if your team is rep if you're if you got deep data science expertise which I don't personally have then it's like hey fine tuning is not that scary and we can handle it this way um but uh I can I can also share that um even at high scale uh for certain organizations there's a sweet spot where uh even even pine cone level performance is is very affordable so but I don't have those figures uh for you yet I I think there's also a data sheet that one of my colleagues just put together and if that contains any of that information I will also make sure to get that to you because I think that might be helpful I think that goes into comparisons around like how fast we are on average and and like what you know scale we're actually talking about when you're dealing with with certain use cases so okay and and sorry one last question please are excellent excent excellent question so in terms of like embeddings like you know the embedding Dimensions like you know I'm guessing that's that's user defined right like you know we we can Define what the what dimension it can used based on my base model correct absolutely when you create in in the API call we've got python client so I'm I'm I'm the I'm a developer Advocate but my other the other half of my job is open source maintenance and so I work on the the python client and like the typescript client Etc along with my colleagues um when you make the API call to create an index at Pine Cone you can one of the parameters is dimensions and um that so you can Define like 1536 if you're using you know open AI text embedding Ada uh whatever your base model requires you can specify that when creating your index okay thank you so much of course excellent questions thank you any other questions I can help out with and then um I'll I made a mental note that I'm going to get the data sheet for you folks and send that around if that's interesting and then also ask about how we can if we already have a cost calculator and if not uh how soon that might be coming up any question questions from the audience and also by the way even if you you think of something later um I am I'll type my my email in here so it's z.o and I am always happy to answer field questions you can also find me I'll put this on here you find me on LinkedIn here um and and like I said anything that I don't know I'll take it back to engineering or sales or whoever is the right right folks to ask um if you have questions about the open source stuff by the way that I do work on that so maybe I'll just quickly share this with you folks because I think I think we do a really good job of maintaining these um and there's a tremendous amount of work like my colleague James for example you might have seen his YouTube videos like he's he's definitely a rock star um and all of these notebooks here we've now added like some um I have a video about how to use it and there's a guide about how you can run them for free in Google collab so uh please feel free these are open source so as you can see like you don't need to be logged in you don't need to be a pine cone customer we even have a free tier which before I join pine cone I was using um and that free tier lets you use one index um and you can really use it like I I put a production chatbot and you know deployed it using that and so we go into all the different use cases here like chatbots um Nemo guardrails this is super interesting this is something James just figured out where the video released um this uh framework for putting guard rails on those llms that we just talked about where you know if you can say like don't answer questions about polici and don't be offensive right you can actually start defining some of that stuff in here and then all of our examples if you click opening collab um it's free to run these in Google collab and all you would need is to go get your API key from Pine Cone which is also free um open AI might I think free unless you want faster performance then you got to pay for it right it's 20 bucks a month but just to say that these exist um so if any of you are um kind of Open Source or or data science or Jupiter notebook hackers um check out this link as well and I'll drop that in the chat too so anyone can copy it and then if you have questions about this this is kind of um my responsibility too so happy to answer questions fix bugs you know um feel free to open issues if you have any questions but yeah otherwise I just want to say thank thank you so much for your time and attention I really really appreciate it um this was a lot of fun for me um so I hope it was useful for all of you guys
Info
Channel: Pinecone
Views: 5,847
Rating: undefined out of 5
Keywords:
Id: wc3Lh-eiNBM
Channel Id: undefined
Length: 43min 31sec (2611 seconds)
Published: Thu Oct 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.