Understanding Embeddings in RAG and How to use them - Llama-Index

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

this is the second video in our llama index series we will be looking at different components of the document q a system and see how to improve them today we are going to be focusing on embedding models in this video we will understand what embeddings basically are how to use them what type of embeddings are available within gamma index framework we will look at how to use open AIS as well as other open source embeddings in the lamba index Pipeline and at the end we will Benchmark different embedding models based on their speed if you are building a system to chat with your documents you need to consider three different components the first one is how do you pre-process your documents so mainly the trunking process I have a dedicated video on that link is going to be in the description of the video the second thing to consider is what type of embedding model that you are using and the last component is the llm that you're using to generate the responses I would personally argue that the document pre-processing step as well as the embedding model are the most important components of this pipeline now in order to explain why I believe that let's first explain what exactly embeddings are in order to understand the concept of embeddings let's look at this very simple example so let's say we have four different words man woman boy and girl we can represent each of these words in this two-dimensional space and the features that we are using are age and gender if you look at this very simple two-dimensional representation you can see that a boy is semantically closer to a Man compared to the other two words now in this case age and genders are going to be called semantic features and you can represent them by numerical values as shown here now you can add more words on this two-dimensional semantic space and then you can start seeing a pattern that The Words which are semantically closer to each other are going to be closer in this semantic feature space as well for example a grandfather is closer to men versus it being closer to a woman and the beauty is that you can add more features to this feature space so for example now we are looking at the royalty as well so this is this becomes a three-dimensional space and you can represent each word as a vector of three dimensions now the best part is you can actually do some arithmetic and computations on these semantic representations so for example if you take the vector that represents a king and a man and subtract man from the King then the resultant Vector is going to be closer to a woman now similarly if you subtract man from King and then add woman so you will get a vector which is going to be very closer to Queen okay so you might be thinking that yes this makes sense however these features that we see in here these are the one that we Define however we can train neural networks which will come up with a different feature representations while preserving the semantic meaning of different words or sentences together now these multi-dimensional feature vectors are actually called embeddings so basically that means that you can take word embeddings for the sentence so for example a here is a sentence I want to cancel my shoe order and then compute the word embedding for each word and combine them together to get a sentence embeddings but the main part is that similar sentences are going to have similar embeddings okay so this is great but how does it work when we try to create Vector stores on top of embeddings okay so let me show you a very simple representation of a vector's term so the vector store has three different components first is the chunk ID so basically this is the text Chunk that was created by splitting the text then the original chunk text and then the corresponding embeddings now for a retrieval augmented generation system now with a new user query comes in we need to find the closest text Chunk based on the embeddings so basically we compute embeddings for the user query and then uh compare the embeddings of the text text chunks present in our Vector store and try to figure out which one is the closest text Chunk and as a result we will get one or more chunks which are very similar to the query that the user has provided so now as you can see this is the most important part because if the embedding model is not good and is not able to retrieve the proper chance then that will results in bad performance of the llm now why because what happens is that we will take the query as well as the chunks that are returned by the embedding model so this could be one chunk or a number of different chunks based on how many any chunks you want to return those are fed into the LNM to generate a response now as you can see that the llm does not have access to the whole document it's only looking at the user query as well as the chunks written by the embedding model that means you really want to pay close attention to what type of embeddings you're going to be using in order to represent the semantic meaning of the chunks that you are providing okay so enough of the theory let's look at some of the code so in this case here again I am just importing different packages that I need I'm importing the Nama index package open AI then Transformers because it'll also show you uh how to use open source embeddings including instructor embeddings bgem weddings along with open Ai embeddings and then we need some other packages for example accelerate instructor embeddings sentence Transformer and by PDF reader DNR in order to load PDF files okay so we will first start with open AIS embeddings so for that we need the open AI API key again some of the other packages that we need so we need the open AI model our Vector store index then in order to load files from a directory we're going to be using the simple directory reader and some other packages for formatting if you have watched my previous video so I showed you this four lines of code to create a simple document q a system so we load our documents we create a vector store uh then we convert this in into a query engine and then we can ask question now in this case we did not Define what type of embeddings we were using so again this is the part of the pipeline where it's converted into chunks and then embeddics are being computed Now by default if you provide the open API key a llama index uses the open AIS embeddings okay so let me show you how to change defaults in in Lama index I have covered this in the previous video a link is going to be in the description of the video so in order to change the defaults you're going to be using the service contacts and then we can pass on the different parameters that we want to change so for example if you want to change the llm you can Define the llm that you want to use and simply pass this to the llm property similarly you can change the chunk size and chunk overlap Etc now the these new defaults are going to be used by the vector store index now in order to change the default embedding model again uh just to show you is an example we are loading the service context then we are importing open AIS embeddings and here we are defining a new variable called embed model and that's the open area model we simply passed that to the service context now whenever we use the service context as a part of the vector store index it's going to be using the open AIS embeddings now let's look at a practical example of how these embeddings looks like so what I'm doing in here is I took the open AI embeddings object that we just created and then I'm calling this function called get text embedding and I'm passing a simple sentence so AI is awesome this is my sentence and this will return the embeddings computed for this sentence we can actually look at the embeddings so you can see it's a list of numerical values these are basically different dimensions across switch the vector is being computed and we can see how many different dimensions this specific embedding Vector has now the open AI embedding model has 1536 different dimensions so that means that each paragraph or chunk is going to be represented by a vector of this size as you probably know that whenever you use open ai's embeddings model you actually have to pay open AI right but we don't want to pay all by AI so we want to look at some Alternatives now the good news is that there are a bunch of Open Source embeddings out there which are good for different tasks so you can actually look at this massive text embedding Benchmark which is hosted on pluginspace and you can pick embedding model that works best for your use case for example if you look here currently the BGE large English model is the one that is on top of the leaderboard now you can see the number of Dimensions that each of these embedding models have this one has a size of 1024 Dimensions if you look at the smaller model something like this BGE small English so it has only 384 dimensions now the one that I really personally like is this instructor large embedding model and it has 768 damages so let me show you how to use these embedding models with llama index okay uh so it's as simple as what we did for the open AIS model there is a class called again face that you need to import and then you simply need to pass on the name of the embedding model that you want to use so you can basically go to the leaderboard select the embedding model that you want to use simply copy that name of the embedded model and paste it in here so in this case I'm actually using the small model because I'm running this on a T4 collab GPU uh so it downloaded the embedding model so in order to use this model we're going to be using the same function get text and bearings and this will compute the embeddings for our text now you can see with the small model the embedding size is 384 as you can see with this model we have a vector size of 384 Dimensions okay the next model we are going to be looking at is the instructor embedding model again you simply need to provide the model name from hugging face and it will download the model for you and you can then look at the dimension of the vector itself so again it has a vector size of 768. next we will look at some benchmarks in terms of the speed of computations so for Benchmark computation we're going to be using uh this file it's part of the lamba index documentation so they provided this example now this specific file has a total of 100 172 pages and we want to see which model is the best in terms of time computation so basically what I have done in here is I downloaded uh the file so this is the PDF file that I downloaded and it has a total size of around 20 megabytes so once the download is finished you can actually go and see in here that it downloaded that PDF file okay next we need to load that document so for that I'm using the simple directory loader I am providing the file name so you can directly provide the file name and it will load that file right so this is basically becomes part of the document now in this case it's converting each Different Page to a separate document and that's why you will see a total of 172 different documents okay so next we want to create a vector store based on the documents that we just loaded so in this case we will keep everything to the default except the embedding model and the goal is to compare the different embedding models that we saw so far in terms of their computational speed okay so now let's see how long it takes for each model to do the embedding computation so the first one is the open embedding model we need to set it as a default model so for that again we're using the service context change the default model to the open aim building model and here what I'm doing is I'm running it two times with the loop of two right and creating the vertical Vector index so basically you want to look at the mean and standard deviation in terms of the compute time and for this purpose I'm using the time it magic function within two jupyter collab a Jupiter notebook right so it's going to iterate through the vector store or the embedding computation process multiple time and give us the mean and standard deviation of compute time now keep in mind for um open AIS Vector embedding it's actually making call to the open AI servers transferring data and receiving data so it's going to be longer compared to any local embeddings that you're going to be using now based on our results it took around 46 seconds on average and the standard deviation is very small now a couple of things to notice in here that uh for 172 Pages it actually created 428 chunks based on the default values of the chunks as well as the overlap now let's compare the compute time to the open source embedding models that we were running locally so for the BGE model it took around 9 seconds on average and for the instructor embedding model it took around 19 seconds that means if you're using a local embedded model you will get the embeddings computed pretty quickly compared to the open AI model because it has to make calls to the open AI servers and the embedding size itself is much larger compared to what we are looking at in here another thing to keep in mind is that a larger Vector size for the embedding does not mean better embeddings so in an upcoming video we will do comparison between different embeddings for the information retrieval task and the good is going to be look at a different embedding types but different types of tasks to figure out which embeddings are best for which type of tasks okay so just a quick recap we looked at what embeddings are how do they work and why they are important when it comes to the information retrieval task and what role they play in order to generate responses based on the llms we also looked at examples of how to use them in Lama index if you want to understand lamba index in more detail I would recommend you watch this video thanks for watching and see you in the next one

Info

Channel: Prompt Engineering

Views: 13,300

Rating: undefined out of 5

Keywords: prompt engineering, Prompt Engineer, natural language processing, GPT-4, chatgpt for pdf files, ChatGPT for PDF, langchain openai, langchain in python, embeddings stable diffusion, Text Embeddings, langchain demo, long chain tutorial, langchain, langchain javascript, gpt-3, openai, vectorstorage, chroma, train gpt on documents, train gpt on your data, train openai, train openai model, train openai with own data, langchain tutorial, how to train gpt-3, embeddings, langchain ai

Id: v6g8eo86T8A

Channel Id: undefined

Length: 16min 18sec (978 seconds)

Published: Thu Sep 28 2023