Vector Search RAG Tutorial – Combine Your Data with LLMs with Advanced Search

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In this course, I'll teach you how to use vector search and embeddings to easily combine your data with large language models like GPT-4. First, I'll teach you about the concepts and then I'll guide you through developing three projects. In the first project, we'll build a semantic search feature to find movies using natural language queries. For this, we'll use Python, machine learning models, and Atlas Vector Search. Next, we'll create a simple question answering app that uses the RAG architecture and Atlas Vector Search to answer questions using your own data. And in the final project, we'll modify a chat GPT clone so it answers questions about contributing to the FricoCamp.org curriculum based on the official documentation. And if you like, you can use your own data or documentation. The first two examples use Python and the third uses JavaScript, but you should be able to follow along with just a basic knowledge of either. MongoDB provided a grant that made this course possible. Their Atlas Vector Search allows you to perform semantic similarity searches on your data, which can be integrated with LLMs to build AI-powered applications. So let's start by talking about vector embeddings. Imagine you have a lot of different objects, like fruits, and you want to organize them in a way that shows how similar or different they are. In the real world, you might sort them by color, size or taste. In the digital world, we can do something similar with data. And that's where vector embeddings come in. Vector embeddings are like a digital way of sorting or describing things. Each item, like a word, image or anything else you can think of is turned into a list of numbers. This list is called a vector. The cool part is that similar items will have similar vectors. By turning items into vectors or lists of numbers, we can use math to understand and process them. For example, we can measure how close two vectors are to see how similar the items they represent are. Words can be turned into vectors and words with similar meanings have vectors that are close together. This helps in tasks like searching for information, translating languages, or even sharing with AI. Creating these embeddings usually involves a lot of data and some complex math. The computer looks at many examples like how words are used in sentences and learns the best way to turn them into vectors. Vector search is a method used to find and retrieve information that is most similar or relevant to a given query. But instead of looking for exact matches like traditional search engines, vector search tries to understand the meaning or context of the query. Vector search is a way to implement semantic search, which means using the meaning of words to find relevant results. Vector search uses vector embeddings by transforming both the search query and the items in the database like documents, images, or products into vectors, and then comparing these vectors to find the best matches. In essence, vector search leverages vector embeddings to understand the content and context of both the query and the database items. By comparing these vectors, it efficiently finds and ranks the most relevant results, providing a powerful tool for searching through large and complex data sets. MongoDB Atlas Vector Search allows you to perform semantic similarity searches on your data, which can be integrated with LLMs to build AI-powered applications. Data from various sources and in different formats can be represented numerically as vector embeddings. Atlas Vector Search allows you to store vector embeddings alongside your source data and metadata, leveraging the power of the document model. These vector embeddings can then be queried using an aggregation pipeline to perform fast semantic similarity search on the data using an approximate nearest neighbors algorithm. In this course, I'll be demonstrating how to use Atlas Vector Search in your applications. Atlas has a basic free tier N0 cluster that you can use that is free forever. In this first project, we'll implement semantic search for movie recommendations. We'll be using a sample movie data set containing over 20,000 documents in MongoDB. And we'll be using the all-many LM L6 V2 model from HuggingFace for generating the vector embedding during the index time as well as query time. But you can apply the same concepts by using a data set and model of your choice. And our code will find movies with semantic search using the process I described earlier about vector search. All the data including our embeddings will be stored in a MongoDB instance. So we'll be connecting our MongoDB instance. But first, we have to create that. And before we can create that we need an account. So let's create a MongoDB Atlas account. If you don't already have one, if you have one, you can just click sign in. So you're basically going to be clicking tray free or sign in, I already have one. So I'm just going to click sign in. And if you're making account for the first time, it will probably help you have you create a project. But since I already have an account, I'm going to go to new project. And then I'm just going to call this gen AI. And the next and create project. Okay, now I need to create a deployment. And when we create an appointment, if you were just creating your account for the first time, you'll see this. But I'm just gonna create the select the free options. So we're gonna, it doesn't matter really which one of these you choose, you don't even have to have an account, it's going to set that all up for you. So I'm just going to set pretty much everything to default. And then I'll just click Create. So you can see it's provisioning right down here in the corner. So here we can set up the authentication. And I'm just going to create this user automatically gives us a username and password, we are going to be working on our local environment. And since we're using the local environment, it already has auto filled my IP address, so I can click finish and close. Okay, we are going to be using some sample data pertaining to movies. Now you could use your own data for this project. But MongoDB offers some sample data. So I'm gonna click load sample data. And you can see it says we are loading your new data set to cluster zero. And this is going to load quite a few different sample data databases, basically. And we're just going to be using the one related to movies, which is called sample influx. We will look through the sample data in a minute, but it's still loading. So let's go over to Visual Studio code. Or you can also do this on Jupyter Notebook or any other ID. But I'm going to be using Visual Studio code, you can see I already have a file created movie rex.py. And first we need to install the pi Mongo package. So I'm going to go to the terminal and do pip or pip three, install pi Mongo. Okay, now we see the sample data set is deployed. So I'll go browse collections, you can see we have all these different collections here, we're just going to be using the sample influx collection. And it has a bunch of movies, I go to movies here, you can see, here's the first one, the great train robbery. And then it has a plot, it has the runtime, the directors, the rating, and there's a bunch of stuff about the movie 1903. And there's a bunch of these, you can see if we look up here, there is over 21,000 movies in this database. So really old ones to newer movies. So let's get connected to this database from our local environment. So I'm in this file, I'm going to paste in some code, we're gonna import pi Mongo. And then we are going to connect to our database that says your MongoDB URI, well, I'm going to have to go and get that right now. So to get that, we're going to go over to here, I'm going to click, go back to this gen AI, here's the cluster, your deployment, I'm going to view deployment. And then I can go to connect over here. And there's a few different ways you can connect your database. I'm going to go to just the first one drivers. And what you can go through and collect select all these like, it's kind of optional to do that. Because we already installed pi Mongo, we really just need this connection string right here. So I'm going to get the connection string. And then I am going to go over here. And for this example, I'm going to hard code it into here. But it's not best practice, you really want to use an environmental environment variable. But since I'm just going to be running this locally, it's not going on the production or anything, I'm not putting on the GitHub, I can't I'm just going to put this right into this file here. And then for the password, now I'm just going to show it to you, you may have already seen earlier, I'm going to delete this before I post this video. So I just have to go get my password. Now when I was setting everything up, I could have used a password I remember or copy my password, but I forgot to do that. But that's no problem. We can just go to the security database access. And then here's the account, go to edit, and then edit password. And then we're just going to create a new password. It's just a bunch of random letters, I'll copy this, and then I will update user. Now I can go over here, and I can put in the password. Now let's just do a quick test to see if this is working. I'm going to do print collections dot find dot limit five. So find the first five items. Okay, I'm going to save that and we're going to try it on the terminal. Okay, we got the cursor object that should be what we want. Now let's try that one more time. Okay, good. Yes, that's all we want. And we got all the first five items in the database. So we successfully connected to our database. The next step is that we're going to set up the embedding creation function. There are many options for creating embeddings, like calling a managed API or hosting your own model or having a model run locally, you can create embeddings using the open AI API that costs some money. But there is a free way to do it. We're going to be using the hugging face inference API. And we're going to use this model all dash mini LM dash L six dash v two. Hugging face is an open source platform that provides tools for building training and deploying machine learning models. And they make it easy to use machine learning models via API and SDK. So what's on hugging face, you're going to make sure you create an account or log in, I already am logged into my account here. And then we are going to go to settings, access tokens. And then we are going to create a new token. And I'll just call this gen AI. And you want to make sure it's set to read. And then we can get the token to authenticate to the hugging face inference API. So generate token. And now we're going to find a function that will be able to generate embeddings. Now this, we're just going to set this up and we're going to run in a minute. So I'm going to just get rid of this. And paste this, we're going to move this move input requests to the top. And here we are going to have to put in our token. Now again, you don't want to put it in your file, if you're going to be putting this into production, or if you're going to be putting this on GitHub, you want to use an environment variable. But to make it simple, and since it's only local, I'm going to put in here, and I am going to delete this token before I publish the video. So don't worry about it being revealed or anything. So here's my token. And then we have this embedding URL. And we're going to generate the embedding. So you can see the function is going to take in one argument, the text, and it's expected to be a string. And this indicates that it's going to return a list of floats. This is a type hint suggesting that the output will be a list where each element is a floating point number. And then we are going to make the request request dot post to the embedding URL. And we have to pass in the token. And then we have to pass in the text. And then it's just going to return the embeddings, which is this going to be a list of floats. So we're just going to add this final line to test this generate embedding of free code camp is awesome. So let's try this out. Let's run it. It says the model is currently loading. So let's just try that again. Okay, we didn't get the error, but we've got to print it. So let's print that. Okay, so that first thing, I guess that was just temporarily unavailable the error. But here is the embedding. So you can see this is just a list of floating point numbers, which is the embedding vector generated for free code camp is awesome. And it was derived using the hugging face transformer model. Now the API we're using, the hugging face inference API is free to begin with. And it's meant for quick prototyping with strict rate limits, it is possible to set up a paid plan a paid hugging face inference endpoint. And this will create a private deployment of the model for you, just in case you're running into rate limits, but but we shouldn't need that right now. Okay, the next step is that we want to create and store some embed embeddings based on our database. So we're going to execute an operation to create a vector embedding for the data in the the plot field in our movie documents and store it in the database. So the plot field is just the plot of the movies in the database. For instance, the the SAP head, this simple minded son of a rich finance here must find his own way in the in the world, you can see the pop up will tell you the the rest of the plot. So there's a simple there's a summary of the plot of each movie in the database. And we're going to create an embedding based on the plot field that will allow us to search and find movies that have some similarities in their plots. And we'll be able to use natural language to do that. Creating vector embeddings using a machine learning model is necessary for performing a similarity search based on intent. So let's update our code again. I added a new section to the code. And this is going to create vector embeddings for 50 documents in our data set that have the field plot. So for docking the collection, find where there's a plot that exists and just get 50. And then we are going to generate the embedding based on the plot. And you see we're basically adding a new field right to the database called plot embedding HF. And then the replacing the collection that replace one is going to update the document in the database that the movie information for that movie with the new information, which is basically all the same, except it has this new field here. Now, in this code, we are storing the vector embedding in the original collection, meaning alongside the application data. This could also be done in a separate collection. Instead of just replacing the document in the same collection with a new one that has the embedding, we could create a separate collection for the embeddings as we have the original information. It just depends on your use case and how you want to do that. So I'm going to run this code. And we're not printing anything. So we shouldn't see anything in the terminal. But there will be a way to find out if it worked successfully. Let's just wait till it ends. Okay, so it finished running. Now I can go back to the database. And I'm going to just refresh that so there's a refresh button right here. And if I scroll down, let's see if we have the new field. Okay, so this now has the plot embedding HF field. And you can see, these are the vector embeddings that we just created with the hugging face API. And there's quite a few. So there's 383 embeddings for this one. And we did it for the first 50. So this one will have and then we can if we scroll down a lot. So this is the first one through 20. Okay, now I'm at results 61 through 80. So if we look that we know we don't have that field here, because we've only done it for the first 50 results. Now, ideally for this project, you would do it for every single movie in the database. But because of the rate limits, and because of the time it would take to run through all what was it 20,000 over 20,000 records, I'm just doing the first 50. But if you want, you can take off the limit 50, and then run it for every single document in the collection, that's going to give the best results when you do the search. But if you're going to do all the documents, you may need to create a paid inference endpoint that I was talking about earlier. Okay, the next step is to create a vector search index. So I'm going to go back over to MongoDB. And we want to go to the search tab here. So now we're, we're at Atlas search, and we are going to create an index. So let's just click Create search index. And then we will use the JSON editor and next. So there's a few steps we have to do on this section. First, we need to select the database and the collection on the left. So the mfix database, and then we need the movies collection. Okay, now we're going to enter the index name, we're just going to call this plot semantic search. And then I'm just going to put in some JSON here. Let's look at this dynamic equal true. This indicates among DB should automatically index all fields in the documents added to this collection. So if you have a varying or unknown schema, you may want to set to false and then only fields explicitly defined and the mappings will be index index and then for the fields. So this is the name of the field in the MongoDB documents that we want to index. So remember, we already looked at that. And when we were looking at the example in the database, we saw that there were 384 numbers in each of under each of the field, remember started at zero, so it went zero to 383. This specifies the dimensionality of the vectors stored in the field, the dimensionality is important for the indexing and search algorithms to function correctly. And then similarity dot product, this defines this setting defines a similarity metric used when performing searches against this vector field. The dot product similarity means that the dot product between vectors will be used to measure how similar they are. A higher dot product indicates greater similarity. This is a common choice for normalized vectors. And then we have the type KNN vector. This indicates the field is a K nearest neighbor vector type. KNN vector fields allow you to perform efficient similarity searches to find documents with vectors, vectors similar to a query vector, then we can just click Next, and then create search index. And you can see the the search index build is now in process. So it takes a little bit of time to process it. Okay, it's done. The index is built, where you are going to test it, but not with this button. And we go back over to our code. And then I'm going to add a new code snippet that we'll explain. First of all, I did comment all of this, we could have made two separate Python files, one to generate the embeddings and one to use the embeddings. But we already generated them, we don't need to generate them again. So I commented all this out. And now basically, this is what we're going to be. This is the code to try out and use the embeddings. So this code will perform a vector search in the MongoDB collection, using the vector search aggregation pipeline stage. This is used to find documents in the collections. Who's where the plot embeddings hf field is semantically similar to the provided query. So we got the query imaginary characters from outer space at war, which you can try changing it to something else. And this sets the the query that we're going to use here, when we see where it says vector query, generate embedding query. So oh, we actually do need this part of the code. I forgot, we just didn't need to generate all the embeddings again, but we are going to generate the embedding for the query string. That's important. So we're going to generate the embedding, basically the list of numbers for what we're searching for. And to kind of backtrack, when it says collection dot aggregate, that executes an aggregation pipeline on the collection. And the aggregation pipeline is a powerful is a powerful feature in MongoDB that allows you to process data and return computed results. So like I said, this is going to this is a specific stage that is going to perform a vector search. And we already talked about creating getting the embedding, and the path, or the field basically, that we're using. And then the num candidates 100. This is an optimization parameter that tells MongoDB how many candidate matches to consider internally, before returning the final results. Higher number can improve the accuracy of the results, but may increase the computation size. And then the limit for that's just going to limit the number of results returned to the top four matches. You may want to return a lot more than for like 1020, but we're just going to limit it to four. And the index plot semantics search. Well, this specifies the name of the index use for the search, which is the one that we created. So now we're going. So then the final part of the code is going to go through every item in the result and give us the movies. Now, we're just about to do it. But remember, we only actually ran it on 50 movies. If we had created embeddings for every instead of limit 50, if we did it all did all of them, the results would be a little better. But let's see what we get when we run on 50 movies. Well, I guess the first thing is to fix the code. And then we run it on 50 movies, those curly braces and brackets. Okay, so remember, the query was imaginary characters from outer space at war. So we basically got four different ones. So this one, you can see it's about going to war. So that's kind of similar. This one's about going to war. All of these have to do with war. None of them seem to have to do with outer space. But that's probably just because the first 50 movies didn't have very many outer space ones. Percool Agarwal already created embeddings for all 20,000 plus movie plots in the database and ran the same search term and came up with these movies. So we only created 50 embeddings. But if we had created embeddings for all of them, this these are the results that we would get. When it's looking at all the movies in the database with with the embeddings, it would come up with movies that more closely matched the query about the imaginary characters from outer space at war. So this project was with the free hugging face API to get the embeddings. If you want to use the open AI API to get the embeddings for the search term, it is a paid API. But the sample database makes that a lot easier. So we actually added the embeddings to this movie's database. But you can see there's already an embedded movies database where they've already and this is just part of the sample data that you get. And there's already a plot embedding for each for each item for each document, all 3483 documents. And it's created with the open AI API. And you when you're getting embeddings, you basically have to use the same API. So we can't use the hugging face API to do a search based on these embeddings that were created with the open AI API. But let me just show you really quick how we could use the open AI embedding API to query based on these embeddings that are already in here. And then we won't just be doing a query based on a few of the embeddings that we created for for this database, but we would be querying against every document the whole database. So first, we're going to have to create a search index, pretty much just like before. So create your index JSON editor. So first, we'll choose the database, embed movies, then index name, same as before, we'll call this plot semantic search. And then this will be a little different. So the number of dimensions is different. The path instead of plot embeddings, underscore HF for hugging face, it's just plot embeddings. That's the field. And for some similarity, this time, we're going to use Euclidean. So I'll go to next and create the search index. Now I already went ahead and update the code we have movie Rex two to just use the open AI API. So we let me just show you a few of the differences we input open AI API with API key. Again, you never want to put it right in the file. But this is just for demonstration purposes. And I will be deleting this API key before this video goes live. And this is all the same. But now when we generate the API, we have this generate embedding function. But now the ends within the function is different. So this is basically all that's different. So now we're using open AI embeddings that create, and we use this embedding model. And now we are going to get the embedding. So basically, we already had this function generate embedding. And now we just changed what's inside that function to use the open AI API instead of the hugging face API. And then the only other thing we changed is the path is now plot embedding instead of plot embedding underscore hugging face. So let's try running the function. Let's try running this code. Okay, remember, the query is imaginary character from outer space. And now these are actually related to outer space, extraterrestrials who dominate Earth, we have aliens, and imaginary characters, and they're going there at war, we have benders game, which is again, Planet Express. So that's related to planets and at war Guardians of the galaxy, which is now another, some more imaginary characters outer space at war. So now that we are looking through the entire database, we get results that more closely match our query, you can see there's no way we could get results like this with just normal word searching. But when we search using vector embeddings, we can have more semantic search, we can use more natural language to figure out the movies that we actually want. Or if you're not using a movie database, whatever is in your database, it's going to give a lot more power to your searching. So we just demonstrated how to use hugging face inference API's, how to generate embeddings, how to use Atlas vector search. And we learned how to build a semantic search application to find movies whose plots most closely match the intent behind a natural language query, rather than searching based on the existing keywords in the data set. We also demonstrated how efficient it is to bring the power of machine learning models to your data using the Atlas developer data platform. In this next project, we'll look again at using Atlas vector search as a vector store, and how to deal with some limitations of large language models. We'll also learn about retrieval augmented generation or round. I'll demonstrate how to develop a real world project that uses these technologies and concepts, along with the Lang chain framework, open AI models and radio. Let's discuss some of the limitations of LLMs that will be able to overcome using vector search. LLMs sometimes generate factually inaccurate or ungrounded information, a phenomenon known as hallucinations. LLMs are trained on a static data set that was current only up to a certain point in time. This means they might not have the information about events or developments that occurred after the training data was collected. LLMs don't have access to a user's local data or personal databases. They can only generate responses based on the knowledge they were trained on, which can limit their ability to provide personalized or context specific responses. LLMs have a maximum limit on the number of tokens or pieces of text they can process in a single interaction. The retrieval augmented generation or RAG architecture was developed to address these issues. RAG uses vector search to retrieve relevant documents based on the input query. It then provides these retrieved documents as context to the LLM to help generate a more informed and accurate response. That is, instead of generating responses purely from patterns learned during training, RAG uses those relevant retrieved documents to help generate a more informed and accurate response. This helps address the limitations in LLMs. Specifically, RAGs minimize hallucinations by grounding the model's responses in factual information. By retrieving information from up-to-date sources, RAG ensures that the model's responses reflect the most current and accurate information available. While RAG does not directly give LLMs access to a user's local data, it does allow them to utilize external databases or knowledge bases, which can be updated with user-specific information. Also, while RAG does not increase an LLMs token limit, it does make the model's use of tokens more efficient by retrieving only the most relevant documents for generating a response. This section of the tutorial will demonstrate how the RAG architecture can be leveraged with Atlas Vector Search to build a question-answering application against your own data. Let me quickly tell you about some of the other technologies we'll use for this project. LANG Chain is a framework designed to simplify the creation of LLM applications. It provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. This allows AI developers to build LLM applications that leverage external sources of data. In LANG Chain, a chain refers to a sequence of components that are linked together to process language tasks. Each component in the chain performs a specific function, like understand a query or generating a response, and passes its output to the next component in the sequence. This modular approach of chaining different components or modules simplifies complex application development, debugging, and maintenance. This is not a LANG Chain tutorial, so I'll only be briefly describing these sections of our code. Grideo is an open source Python library that's used to build a web interface for machine learning and data science applications. We'll be using the OpenAI API to access two different models. The first is OpenAI's embedding model, which we'll use to create embeddings. The second is a large language model which we'll use to generate text. Now let's go to the code editor. So now we're going to see how all these concepts work by creating a question answering app that can answer questions from our custom data. We're going to start by installing a bunch of packages. So a lot of these are ones that we have talked about LANG Chain, PyMongo, OpenAI, Grideo, and you'll see how all the we use all these as we develop our application. You are going to need an OpenAI account, and you're going to have to have enough credits, which basically means you're probably going to have to have a paid account. So I'm going to go over to API keys, and then I'm going to create a new key. Let's call it answer. And I will be revoking this key before this tutorial goes out. So don't worry about it being on screen or anything. I'm going I already have three files made. And I'm going to create save the OpenAI key in this file. So open AI API key. And in this file, we are also going to save our Mongo URI. And the Mongo URI, we're just going to use the same Mongo URI that we used before, we can go to connect. And it's just going to be this. And we have to make sure to update the password. And then I'll just save that file, we shouldn't need that one anymore. So now we're going to use these two scripts, load data.py is going to be used to load the documents and ingest the text and vector embeddings in a MongoDB collection. And then extract information will generate the user interface and will allow us to perform question answering against our data, our data using Atlas vector search and open AI, we're going to import the same libraries into each file. So py Mongo, this open AI embeddings, MongoDB Atlas vector search, directory loader, open AI retrieval QA, radio, base, and then import key perm, which is basically just this other file. Now we are going to get our sample documents. In this tutorial, we'll be loading three text files from a directory using the directory loader, that's this one, these files should be saved in a directory named sample files. So I'm going to create a new directory called sample files. And then inside this directory, we'll have three files, we'll have log example, dot txt. We'll have chat conversation dot txt and aerodynamics dot txt. And I'm just going to paste in some text into each of these files. If you look in the description, you can get all the code and text that we're using in this tutorial. So in the log example, just going to go bam. So we have just a bunch this log here. And then for the chat conversation, we're going to put a sample chat conversation. And we can kind of go through it CSS. Alfred, hi, can you explain to me how compression works among the b Bruce? Sure. MongoDB supports compression of data at rest. It uses either Z lib or snapping compression algorithms at the collection level. And then it kind of just kind of explains how that works. And then with the conversation continues. Alfred, interesting. That's helpful to know. Can you also tell me how index index are sort of MongoDB Bruce MongoDB indexes are stored in B trees. So basically, it's giving information about how MongoDB works. So the last thing we're going to last file is aerodynamics. So this is giving information about aerodynamics boundary layer control achieved using suction or blowing methods can significantly reduce the aerodynamic lag, drag on aircraft, wing surface, the yawing of an aircraft, indicative of its side by side motion is crucial and so on. So this is all information that we're going to be able to access within our our chat bot. Okay, we don't need these files anymore. So we're going to do a little bit more of code that we're going to use in both the load data and extract information files, but then they're going to start being different. So this is how we're going to access our database. So first, we're going to load the MongoDB client from our Mongo URL. Our database name is link chain demo, collection name, collection of text blobs, and then we're going to get access to the collection. So both both files, we need to be able to get access to the collection. Now we're going to start focusing on the load data file. And the next thing we're going to do is initialize the directory loader. So I'm going to put a little more code here. And you can see it's going to load the files, it's going to load the sample files directory, and we're going to load this data. And next, this line here, we're going to define the open AI embedding model we want to use for the source data. The embedding model is different from the language generation model. So we're using the open AI embeddings model before we showed how to get embeddings with the hugging face API model. But you can also get embeddings from the open AI model. Okay, this next line, we're going to initialize the vector store, we're going to vectorize the text from the documents using the specified embedding model, and insert them into the specified MongoDB collection. So that's what this whole thing is doing. We'll see we're passing in the data, the embeddings model, and then the collection, we're going to put them into and it's just using this MongoDB Alice vector search, which we are importing here. So now that we have this, this code is actually done now, and let's run it to create all the embeddings and put them into the MongoDB collection. So we'll just do Python load data. Okay, it should be finished, but let's check. So we'll go over here, we're going to go to our collections. And look, we have our lane chain demo. And then here's the collection of text blobs, we have the embeddings based on the first document, the chat conversation at txt, the embeddings based on the aerodynamics document, and the log examples document. And then we can see the array. So here's actually the the array of all the floating point numbers. And you can see, let's see how many there are this time. 1535 actually 36 because it starts at zero. So we've created we successfully created our embeddings. Next step, create our Atlas search index. So I'm going to go to search. And we have this one from before, but I'm going to create a new search index, use a JSON editor. And then we have to choose the right collections. And now we're going to update the JSON. This time we'll just use the default index name. And everything's pretty similar, except our field is called embedding. And the dimensions are different from the one from the example from before. And now we're using the similarity of cosine, which we can see it tells what that means it measures the similarity based on the angle between vectors along for similarity measurements not scaled by magnitude. So depending on what you're trying to do, you're going to use different similarity. And we'll use the same type of cane and vector. Okay, I will create the search index. And the build is now in process. While that build is happening, we are going to now fill out the extract information script. And so all this is going to be basically the same, and we're going to connect to the client the same collection. But now it's going to start to get a little different here. And we are going to obviously being we're going to extract information. So first, let's get access to the embed, we're basically define the open AI embedding model we want to use. So that's basically just like before, we also want to get access to our vector store. And we're going to pass in the collection and embeddings model. So now we're getting access to the vector store that's stored on the Atlas, basically, this is the Atlas vector store that's stored on MongoDB. Okay, now here's our query data function, it's going to take a query as its input, basically, the person, the user is going to send a query. And the function is designed to process the query and retrieve relevant information using some different tools. So the first step in the function, I have this one here, it involves this is going to convert the input query into a vector using open AI embeddings. This process is crucial for preparing the query for similarity searches in a vector space, it's going to perform an Atlas vector search using length chains vector store. So the similarity search passing the query and k one means we're going to retrieve the most similar document from MongoDB, based on the query vector. So then this line here is going to extract the page content from the top document return list, which is the most relevant information in our original query. So next, we are going to define a LLM or a large language model using open AI's API. And if not specify the default model use the you the default model use is the open AI's GPT 3.5 turbo. Here we see the temperature set to zero, which generally means the model is responsibly more deterministic and less creative. Then we do a retriever here, we're going to initialize a retriever for the MongoDB vector store. This retriever is responsible for fetching documents that are relevant to our query. Then here, we're going to this retrieval qa. This chain is then executed with our original query. This process involves the LM generating responses based on the documents retrieved and the nature of the query. You see we're going to load the stuff documents chain stuff documents chain takes a list of documents, insert some onto a prompt and passes that prompt to an LLM. And then finally, we're going to execute the chain. And then the function is going to return to outputs that as output, which is the content of the most similar document from the Atlas vector search, and the retriever output, which is the output generated by the retrieval qa chain using the rag architecture. So this strip script basically showcases the integration of of an advanced AI and database technology to create a powerful tool for information retrieval and processing. We combine the capabilities of open AI's language models, open MongoDB vector search and laying chain to efficiently process and answer complex queries. And next we have this. Basically, we're just creating a web interface for the app using Gradio. Now I'm not going to go into a lot of details about this because that's kind of not really what this tutorial is about. But you can see it's it's pretty simple as it's using Gradio to create a simple web interface. There's going to be some a text box we enter your question, there's going to be a submit button. But one thing I want to point out is that there's two outputs. And outputs correspond to these two outputs from the function. The first output is just what it would normally output with only using Atlas vector search. The second output is generated by chaining Atlas vector search to link change retrieval QA in the open AI LLM. So we can kind of see what it would look what the output is like, just with normal Atlas vector search, and with all these other tools. So let's test it out. So do Python extract information that pie. And we just have to go to this URL. So I'm just going to enter a question. Did any error occur on August 16? If yes, what was the error caused by the Apple with just Atlas vector search returns the textual as is. So basically, that's the entire error, it doesn't do anything with it. But when we chain the search to link change retrieval QA and open AI is the open AI LM Yes, an error current August 16 error was caused by resume of change stream was not possible. The resume point may no longer be in the oplog. It's much easier to read that way. Okay, let's try one more. What questions did Alfred ask? What were Bruce's answers, please summarize and bullet points. Here, we're basically getting the entire conversation, the entire field. But down here, it's actually able to do it. I'll ask what does compression work? How does compression work among it be? How our index is stored among it to be and so on. And then we get the answers among me supports compression of data at rest using either Z lib or snappy compression algorithms at the collection level. So now it's using. So now it's actually giving the answer in something that's in a way that's easier to understand. We can also use this for sentiment analysis. So I'm going to say, what was overall sentiment of Alfred's chat with Bruce? What was the likely CSAT? Again, we're just getting the entire chat right in here. But down here, Alfred's chat with Bruce was likely very positive. The conversation was focused on technical details and Bruce was able to provide helpful explanations to likely see how see CSAT would be high. We can also use this to get precise answer retrieval, like there's a lot of information, how can we get like an exact answer? So how about what's yaw? Okay, we have something that has some information, but it gives us a gives us a lot more information. It basically gives us the whole whole text. But here we have yours angle of an outcrafts and aircraft side to side motion is controlled primarily by the rudder. And you can kind of see where it comes from the angle of an aircraft side to side motion, control parameter by the rotor, but then we don't get we don't get all the other information that we don't need. Okay, well, so in this section of the tutorial, we've seen how to build a question answering app to converse with your private data using Atlas vector search as a vector store, while leveraging the retrieval augmented generation architecture with length chain and open AI vector stores or vector databases play a crucial role in building LMI applications. And rag is a significant advancement in the field of AI, particularly in natural language processing. By pairing these together, it's possible to build powerful AI powered applications for various use cases. And this final project will modify a chat GPT clone, so answers questions about contributing to the fecal cap.org curriculum, based on the official documentation, you can use these concepts to create your own chat bot that connects with your own data to answer questions. This project has some similarities to the last one, but takes things to another level. So let's look at a chat GPT type application that gets information from your own documentation, we'll be using the free code camp documentation. So a lot of this code is already developed, and I'll be walking you through it and we'll be adding some things. So if you're following along at home, start by cloning the repository, I have a link in the description, and then doing npm install. And then you're going to have to configure the application, we have this emv.env example, you're going to have to create a new file that's just a dot emv. And then you're just going to fill in your open AI key, and your MongoDB Atlas URI, we are I already showed you how to get that URI and the previous part of the tutorial, you can even use the same URI that we already use for the first project. For all these projects, you can use the same MongoDB Atlas URI with the password and the username right in there. After we do that, we can test the application, we'll just do npm run dev. So here's what the application is going to look like. Right now, it's basically just like chat GPT, and it does not use our own documents. So I can say, write me a poem about being a web developer. And it's using the open AI API to do this, but it's not accessing any custom information. So let's now see how we can make it access is can draw data from the free code camp documentation. So first of all, I want to show you the documentation I'm talking about. This is specifically the documentation on how to contribute to free code camp to help out with free code camp. So there's a lot of sections, there's the getting started, how to help with translation. And then the biggest section is how to contribute to free code camp, free code camp, the free code camp.org is an open source project. And you can help out with the code base. So it talks about how to set up locally, the best practices, how to work on the code base, and so on. There's a lot of documentation on how to get set up locally, the free code camp.org, and how to work on it. And then there's also some additional guys down here. And what we are going to do is we're going to put all of this documentation into our chat app. So if you clone the repo, it will be all the documentation will be right in assets, FCC dot FCC underscore docs. And then we have the markdown file of all the markdown files of all the documentation on that page here. And so we are going to basically put all this documentation into our chat app, we're going to do that by creating embeddings for all this information, and then creating a vector search index, it's going to be kind of similar to what we did in our first project. But now we're doing it with more data, and more of a real world type project. So there's already some code here to create the embeddings. So if we go to create embeddings, that MJ s, let's look at this code here. So some of this is going to look similar to code that we've already seen. So first off, we're importing the necessary modules. So FS for the file system operations, then we have the recursive character text splitter for splitting text into chunks. Then this one MongoDB Atlas vector search is for storing vectors and MongoDB Atlas. And the open AI embeddings from Lang chain is for creating the embeddings and the Mongo client for connecting to the Mongo database, and then dot emv for the environment variables. And here, kind of like what you've already seen, we're just creating a MongoDB client using the connection URI from the environment variable. And the script connects to the docs database and the embeddings collection. Next, we specify the directory containing our documents and read the file names using the FSP dot reader directory. So these are all our markdown files from the free code camp documentation. And then we're going to start processing each document, we're going to read the content and log the file name. And then the next steps will involve vectorizing the document. So this section right here, we're going to create a recursive character text splitter configured for markdown language. And we're going to split the document into chunks of 500 characters with a 50 character overlap. And then we are going to create the embeddings, we're going to use MongoDB Atlas vector search to create the embeddings for each chunk, using the open AI, open AI embeddings class, and we're going to store them in the MongoDB Atlas collection. And then finally, we are going to close the connection. So basically, the script efficiently processes documents, creating embeddings and stores them in the MongoDB Atlas using the link chain library. So now we're just going to run this. I'll run it in the terminal down here. And just so you know how I'm going to do it, if I go into the package dot JSON, we have an embed script, that's just going to call that file. So I'll just do npm run embed. And we can see, it's going to create embeddings from all these documents. And it's vectorizing all of them. Okay, we're done. Now let's view the embeddings in the database. So I'm back at MongoDB Atlas. And I am going to go to browse collections. And then here's the new one docs embeddings. These are the embeddings we just added. So if we scroll down, we can see that we have the embeddings for each document. So if we look at this one, we can see the embeddings array 1535 or 1536 entries in the array or documents. So now we've created the embeddings for everything, you can see some of the text right in here. And it's broken that text into different segments and create an embedding for each one. So at this point, we need to create the vector search indexes. This is very similar to what we did before. So I will scroll up to the top of here, not those embeddings, but I'll go to search. And then I'm going to put create search index, and we'll go to the JSON editor next. And then we'll select the database and connection. So embeddings, and we'll leave the default index name. And then we can see we're dealing with the fields, the field embeddings, the dimensions, the similarity cosine, this measures a similar similarity based on the angle between vectors along for similarity measurements, not scaled by magnitude. And then the type is cane and vector. And then next, then create search index. Okay, now we just have to wait for the build to complete. Now we're going to update the API routes to use these embeddings. So let's look at the current API route. Again, we're not building this app from scratch, the code is already given. So let's see what the API route already looks like. So this is the default API route for chats, and basically just only uses the open AI API without any of our extra information. So let's see what that looks like. So we're going to import the required modules. And we're using this streaming text response for handling streaming text responses, obviously, and then LinkedIn stream for setting up a streaming pipeline, and then message for representing the chat messages. And then we have chat open AI, which allows us to interact easily with open AI GPT model, and then AI message and human message from the length chain schema for structuring messages. And we're going to set the runtime to edge. This is indicating that the code is designed to run on the edge potentially in a browser or similar environment. And then we are going to handle post requests, we're going to extract the messages payload from the incoming JSON request. And then we're going to initialize a lane chain streaming pipeline, painting a stream and a set of handlers to manage the flow of text data. Next, we are going to create a an instance of chat open AI specifying the streaming mode to enable real time interactions with the open AI GPT module. And here, we are using the the call method on our chat, open AI instance to process each message. Messages are mapped to either a human message or an AI message based on the role that user or the AI and the process messages are then sent through the lane chain stream using the provider handlers. And then finally, we are going to return the streaming response. So now we're ready to implement changes to our API route in order to utilize vector search with our new embeddings. So let's implement a new route dedicated to handling vector search. So we're going to start by creating a new directory called vector search. And then I will create a file inside it called route.ts. And then I'll add some code. So let's talk about this code, we import the appropriate modules. So this is for creating embeddings. And then this is for vector search, obviously, in this route will connect to MongoDB. But differently than we did before, because the Mongo client promise you'll notice is from the app that lib dot MongoDB. This is a promise that resolves to a MongoDB client, we'll use this promise to globally cache our MongoDB client, so that we don't have to connect to MongoDB every time we want to make a request. This is a common pattern in server serverless applications. And it's a good idea to do this in order to avoid connection limits. And you'll see that we are receiving the user's prompt or question from the request body. And then we'll use the MongoDB Atlas vector search laying chain method to create vector embeddings for the user's question. Remember, we also have to create vector embeddings for the user interaction, so that we can then compare it with the other vector embeddings that we have stored in MongoDB for our customer data, then we tell it which collection index name text key and embedding key to use. And then we do the search. We're using something called maximum marginal relevance or MMR. To find the related documents, we can specify the number of results to fetch and how many of those top results to return. This allows us to refine how accurate we want to be. And then finally, we return our retriever output. Now that we have a route dedicated to vector search, we can update our chat route to use it. So we'll go back to our chat route API slash chat slash route dot t s. And here we're going to just replace this section right here with a whole new section. So we're using the same code from before. But now we're using the vector search route to get the context sections, which then use the context sections to create a template for the AI to answer that question. So you can see we're first getting or extracting the messages payload from the incoming JSON request and retrieving the content of the current message for further processing. Then we have a call to a call to the vector search route using the fetch API. And the current message content is sent as the body of the request. And the response is awaited to obtain the vector search results. Now we have the template. Now this is all one string. And this is created for the AI to respond. So you can see it says, you are a very enthusiastic Frico camp.org representative who loves to help people, given the following sections from the Frico camp.org contributor documentation, answer the question using only that information outputted in markdown format. If you are unsure, and the answer is not explicitly written in the documentation, say, sorry, I don't know how to help with that. And then we are going to get the context sections, which is basically what we got returned from the vector search. So this is basically what we got from the vector search results. So we've already found the part of the documentation that relates to what the user just searched for. And we have that part of the documentation here. And then we have the question, this is what the question that that user asked. So we're gonna now get the answer to the question, just using the context that we got back from the vector search request. And then here, we can see that the content of the current message is updated with the newly created template. And then a lang chain stream is initialized to manage the flow of text data. Now we have an instance of the chat open AI. And it's we specify the GPT model. And we enable the streaming mode. And this part is just like before. The message the messages including the updated template are processed by the GPT model using the lang chain call method. And then we just return the streaming text response. So let's test this out. NPM run dev. So what do I need to know to start contributing to free code camp. Okay, free code camp runs on a modern JavaScript stack. If you're interested in contributing to our code base, you will need some familiarity already with JavaScript and some of the technologies we use like no JS MongoDB, or what 2.0 react Gatsby and web back. Oh, that was helpful. What is the process for creating a poll request? Okay, and we got the steps for creating a poll request. So now I could keep asking more questions based on the actual documentation, the actual free code camp.org documentation. We've reached the end of the course, you should now know enough to implement vector search in your own projects. Thanks for watching.
Info
Channel: freeCodeCamp.org
Views: 171,429
Rating: undefined out of 5
Keywords:
Id: JEBDfGqrAUA
Channel Id: undefined
Length: 71min 46sec (4306 seconds)
Published: Mon Dec 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.