What is a Vector Database?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
over the past year we can all agree that AI applications have really captured the imagination of everyone this groundbreaking technology has really revolutionalized how we will be Computing now and also in the future now as I took a deep dive to really understand how what works in the background it led me to find our topic for today what is a vector database now let's kind of take a stroll down memory lane and let's look at some other groundbreaking moments in technology when it comes to the database area here first we all know about SQL which stores structured data in tables been around for a couple of decades I think we're all are aware of that um and where it's been then came non nosql which takes unstructured data in the form of documents and this has been great for a lot of uh real time web applications as well as Big Data you know that that really came about and then the hint of mobile when we needed to collapse a lot of these better rate apis the graph which stores data in in nodes and that's how it formulates a lot of its relationships which really takes us to where we are now with the vector database uh which is naturally all our AI applications very very useful and supplemental to that so so now let's get into the characteristics of a vector database and when I started my research I realized there are two major Concepts that I had to really get down the first is what is a vector and the second is what is in embedding and I'm really going to simplify things now think about a vector as an array of data that gets put into the database now any type of complex object you put in whether it's images text documents they all get represented in some type of numerical value so I'm going to say this as an array all right and then at some point as data scales up in order to keep the relationships and naturally keep in mind you're not only going to have user data that you put in but this is really going to be a database for a lot of your large langage models to be able to store its safe points um it's it its actual data sets for comparison as we get to the use cases here so the embedding is lots of vectors that are staved in a multi-dimensional i abbreviate that their format where they can then be used as groupings of vectors for data sets that can really start to grow and go from there now with this understanding of vector and embeddings now we have the proper context to really discuss the use cases that really bring this to technology to life now we have our large language models and we've all interacted with a chatbot in the past I think if everybody that's the number one way you've interacted especially if you've actually used chat GPT and the major thing that that uses is a concept called natural language processing so let's take this from our chat box it's the number one I would say feature that you'll see uh this being used um and and it's going to work a lot by taking the context of understanding the semantics of conversation well that model will be able to leverage a vector database to keep its ever growing database to understand a a car is is similar to a engine or the relationships between the terms that you have here now I also have video and image recognition we've all use these type of applications to build AI art as they call it but let's say with the voice recognition the ability to take sound waves or audio file and be able to represent it as a numerical set of data and then be able to make comparisons about this equals this particular semantics of speech all right um and then also the last but not least let's talk about search also very important we may have similarity searches uh being able to identify certain images you've all we've all interacted with recommendation engines all right so search is another bit a one here and we'll just say the similarities all right let's just summarize that there very important thing of understanding when I'm searching for what is related to that those relationships can definitely be represented there now let's get into the benefits of doing this cuz naturally if you did a quick search on the internet you'll be able to see the ability for you to represent Vector data into some of these other databases that I discussed earlier SQL databases no SQL databases all right but you truly get certain great benefits when you use the direct Vector databases to do that number one I would say is flexibility now flexibility in the terms to take any type of data whether it's docs images uh any type of text Data speech you kind of heard a lot of the things I was discussing that all these it doesn't matter when you use different type of database you may have to prepare your data to go in that but with a vector database it's very easy just to throw in or insert a bunch of unstructured data for comparison uh to see now once you have data in be able to ingest any type of data the second is the scalability all right being able to scale out to millions and billions of data points of vectors that you'll be able to have for comparison and if you think about this this is really where the power of large large language models really comes to shine with this extensive database that it has for comparison and if you wanted to start from scratch with your model you often have to throw give it a bunch of training data for it to start to grow and maintain some of his expertise uh to go so flexibility the ability to put data in the ability to scale to millions or billions of data points and once that data is in let's not forget the Speed and Performance of everything here being able to index a lot of these vectors these embeddings being able to query in a low latency mode since it's all in a numerical format it's very easy to to uh run queries uh the large language models to um if you are in chat B is trying to take the conversation and compare it out and do some comparison it's going to leverage this Vector database to put save points if I want to call them that or certain like uh you could probably just say like a cache of of of of data that it can use to make that operation go that algorithm work and whatever the workflow you have uh to really perform like it should so this has been Vector databases I'm always an advocate of having your polyglot meaning that your architecture can have many different types of Technologies multiple databases as a matter of fact you really don't have to always depend on one type of database but one thing we can agree we're all you're all technologist like myself you all are starting to think about how you can Infuse AI into your architecture well I recommend that you take that next step look at some of the open source Technologies for Vector databases and Get Off to the Races thanks for watching in the comments below let us know how you've used Vector databases with your AI projects and as always please remember to like And subscribe
Info
Channel: IBM Technology
Views: 60,243
Rating: undefined out of 5
Keywords: IBM, IBM Cloud, DB, Database, Data base, Vector DB, Vector Database, Artificial Intelligence, AI, Generative AI, GenAI, Gen AI, Machine Learning, ML, Vector
Id: t9IDoenf-lo
Channel Id: undefined
Length: 8min 11sec (491 seconds)
Published: Mon Mar 04 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.