Vector Database Explained | What is Vector Database?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
there are some AI startups that have raised millions of dollars of funding and they have one product in common which is Vector database let's try to understand what exactly is Vector database today when you search in Google calories in apple versus employees in apple Google figures out that the first Apple means fruit and the second one is a company have you ever wondered how does Google does this it uses a technique called semantic search semantic search means not searching using the exact keyword matching but understanding the intent of a user query and using the context to perform the search for doing semantic search internally it uses the concept of embedding word embedding or sentence embedding is nothing but a numerical representation of text let's first understand how exactly embedding Works let's figure out how you can represent this word Apple into a numeric presentation given this particular context one way is to think about different features or properties of words here you can have related to phones easy location has talk Etc as properties and then you assign value for each of these properties Revenue here means 82 billion dollar you get a sequence of numbers as a result and that is nothing but a vector so this Vector is a word embedding for the word Apple for this particular context if you're talking about apple the fruit then the embedding might look something different because the value of these properties is different and when you have the embeddings for different words looking at them bidding you can say that the second apple and the word orange they are similar because look at their values they are matching of course there are some values which are not matching but compared to this Vector in the first Vector second and third Vector are kind of similar same way if you have let's say Samsung as a word you can represent that into a numeric presentation is it related to phone CS is it a location no and when you look at again all these vectors you can figure out that the first and the fourth vectors are kind of similar so using these vectors you can figure out the similarity not just similarity you can actually do a complex arithmetics such as this this is a famous example in the NLP domain where you can perform this mathematics using a technique called word to whack word to whack is a technique to represent word into a numeric representation I have made a separate video so if you want to know more about it you can go and look at this video in that video I have explained how you can generate handcrafted features for each of these and you can do this particular math now just for intuition I explain everything using handcrafted features but in reality you use some complex statistical techniques to generate these word embeddings again if you have curiosity you can watch those two videos or this particular video on bird so far let's say you have this understanding that there are variety of these techniques that you can use to represent a word or a sentence or even a document into an embedding and here are just different techniques which are being used in chat GPT era obviously Transformer based embedding techniques are getting popular so when you're using open AI API for embedding uh you know what technique it is using underneath when you are building any text based AI application you might have thousands or even millions of embedding vectors and you need to store them somewhere when you think about storing them the first option that comes to mind is a traditional relational database so let's say for our use case we have these four articles the first two radar related to apple the fruit the remaining are Apple the company you will first generate the embedding let's say using open AI API and then you will save that into let's say your SQL database now when you have a search query you will also generate embedding for that and you will try to compare this embedding with the stored embedding and try to retrieve the relevant documents here you will use a concept of cosine similarity to retrieve the matching vectors and you can display it in your Google search result this in theory works okay but in reality you will have millions of Records or even billions of Records in your database and that's when things starts getting interesting because just think about matching this Vector for a query Vector if you want to match with these stored vectors then one of the approach you can use is linear search where you go one by one and if cosine similarity is close to 1 then you will put that Vector into your result data set and then you can keep on going and store your result vectors now you already realize the problem if there are millions of stored Vector embeddings your competition is going to be too much you know your your hairs will be raised because you can't handle delay and computational requirements for such a use case you need to do something smart how do we do this in a traditional database well we use a thing called index database index helps you search things faster similarly in this particular use case we can use one hashing function we don't need to go into detail what that hashing function is but let's say this hashing function is creating buckets of similar looking embeddings okay and then when you have a search query you can let that go through the same hashing function which will bucket it into one of these three buckets and then within that bucket you can do individual linear search this way you are only matching with those vectors which are in bucket one you don't have to match it with bucket 2 and bucket three this will speed things up and this technique is called locality sensitive hashing this is one of the techniques that Vector databases is using there are many such techniques and those techniques are outlined in this beautifully written article I'll provide a link to this article you can read through it so far you realize that Vector databases help you do faster search they also help you store things in an optimal way so these are the two big benefits why Vector databases are gaining popularity I hope this video helped you understand the intuition behind Vector databases if you have any question please post in the comment box below if you like this video please give it a thumbs up and share it with your friends thank you for watching [Music]
Info
Channel: codebasics
Views: 9,944
Rating: undefined out of 5
Keywords: yt:cc=on, vector db, chroma vector db, pinecone vector db, what is vector db, langchain vector db, llm vector db, Milvus, semantic search, embedding
Id: 72XgD322wZ8
Channel Id: undefined
Length: 6min 52sec (412 seconds)
Published: Sat Sep 09 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.