Llama Index 101 with Vector DBs and GPT 3.5

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today we're going to take a look at how we can use llama index in production with Pinecone now this is an introduction to the Llama index library that was previously known as GPT index we're not going to go into any details on the more advanced features of the library we're just going to see how to actually use it and get started with it and do that in a way that would be more production friendly with a vector database like Pinecone now for those of you that don't know llama index is a library that helps us build a better retrieval augmentation pipeline for our llms so we would use approval augmentation when we want to give our LM Source knowledge so Knowledge from the outside world or maybe some internal database or something along those lines and that will help us one reference that for the knowledge so we can add in citations and things like that there and it will also help us reduce the likelihood of hallucinations so llama index is a library that was supported in doing that now alarm index can do a lot of things not all of those are going to cover in this video but the main features of the library include the data loaders that allow us to very easily extract data from apis PDFs equal databases csvs all of the most common types of data sources it also gives us some more advanced ways of structuring our data so we can add in connections between different data sources which is kind of useful so imagine you have a load of chunks of text from PDFs what you can do is add in connections between those chunks so the first Chunk in your database would be connected to the next chunk with a little connector that says they this is actually the next chunk and this is the previous chunk and they also support things like post retrieval re-ranking as well so there's plenty to talk about but first let's get started with a simple introduction to the library so we're going to walk through this notebook here there will be a link to this notebook at the top of the video right now so the first thing we need to do is install the prerequisite libraries so go ahead and run that now for the runtime here we don't need to be using GPU so you can just check if you are doing that or not it costs money to use GPU on colab so you can just set hardware accelerator To None to save that money okay and once you're doing that what we're going to do is just download a data set so I'm going to use the squad such as the Sanford question answering data sets Okay so there's a few things I'm doing here first I'm just getting the relevant columns I need there so the ID the context which is like a chunk of text and the title so the basically the page title where that context is coming from and then what I'm doing is dropping duplicates so in the squad data set you will basically have like 20 contacts and 20 different answers but those 20 contexts are all identical for those 20 different questions so you end up with a lot of duplicates context in there but because we are just using the context we actually need to remove all that duplication so that's what I'm doing here and then we get this okay so we have our ID so it's like the document or context ID the context itself and then we have where that is coming from okay so we have the first few there are all from the University of Notre Dam media page and in total we have just almost 19 000 records in there so law index uses these document objects which you can think of as being basically the context or it revolves around the context of your your data right so this this chunk of text and it will obviously include other bits of information around that context so for us it's going to include the document ID right so every document is going to need an ID and this is optional so we can also add extra info which we can think of as metadata for our context now for us we just have title but obviously we could add more this is a dictionary so we could add I don't know something else here right and we could just you know put something but of course we don't actually need that so we will remove that but yeah you can you can put as many fields as you like in there yeah let's run that and take a look at one of those documents and see what it looks like so you can think of this as a core object for for alarm index all right so we have this document we have the text and then if we go through here we have the document ID and that is your info now embedding we don't have an embedding for it yet so we're going to create that later but the embedding is also very important because that's what will allow us to search through that data set later on okay so now what we need to do is actually create those embeddings so to create those embeddings we're going to be using open AI so for that you will need to get a open AI API key from platform.openai.com and then you just put that in here I have already done it so I will move on to this so one step further from our document is what we would call a node so a node the way that I would think of this is It's Your document object with extra information about that document in relation to other documents within what will be your your database so let's let's say you have the chunks of paragraphs or chunks of text from a PDF a node will contain the information that chunk one is followed by chunk two and then in in two it will say chunk one was the preceding chunk before this so it has that relational information between the charts whereas a document will not have that so we would need to add that in there we're not going to do that here we'll talk about that in the future but we still need to use the the nodes here so we're going to just run this so our nodes in this case are basically going to be the same as our documents in terms of the information that they carry but node is the object type that we will build our Vector database from so let's run this I should say here we've we have cell power opening AI API key we don't actually need to use it yet I should have really done that later but it's there now so we have that ready for when we do want to use it Okay so we've just created all the nodes from the documents let's check take a look at those nodes okay obviously we have the same number of nodes as we do documents now we are going to be using Pinecone which is a manage Vector database as the as a database for our llama index data okay so to use that we need to get our API key and environment which we do from App dot Pinecone Dot IO and within there we once you are in at the pinecone.io you should be able to see API Keys over on the left you'll see something looks like this zoom out a bit and you just want to copy your API key and also remember your environment here so I've got us West one gcp so your API key you put in here and here I'm going to put us West 1 gcp okay and after running that let me walk you through this what is going on here so we initialize our connections Pinecone we create our pine cone index so it's going to call it alarm index integing call this whatever you want um the things that we do need to do is one create our index if it does not already exist which if you're running this for the first time it won't and to create that index you need to make sure dimensionality is the same as the text embedding on the zero zero two model which is a embedding model we're using and that damage charity is one five three six and we also need to make sure we're using the right metric we can actually use any metric here so you can use euclidean dot product or cosine but I think cosine is the fastest in terms of the similarity calculation when you're using text embedding r02 although in reality the difference between them is practically nil so you can use any but I recommend gear sign now after that we will just connect to the index okay so here we're connecting to Pine Cone crane index and then connecting to that index okay once that is done we can move on to connecting so we've just created our index connected to it now what we want to do is connect to it through the vector store abstraction in alarm index so to do that that's pretty simple hiking vape store and then we just pass in our index that's it that's pretty easy so this will just allow us to use the other llama index components with our Pinecone website cool so I think that is all good and then we have a few more things going on here so let's talk through all of this let me make it a little more readable so yeah there's a few things going on so we have basically what we're wanting to do here is create our index which is this cheaptvx or index so we're basically going to take all of our documents and we're going to take the service context which is like your embedding Pipeline and we're also going to take the storage context which is the effector saw itself and this will essentially act as a pipeline around that so it's going to take all of our documents it's going to feed them through our embedding pipeline so this service context embed all of them and put them all into our Vector so so I mean it's not in reality it's pretty straightforward let me just explain that from the perspective of where we're actually initializing these so storage context from defaults it's really simple we're just using our vet so there are other parameters in here but we don't need to use any of those because we're just using our Vex sort with the default settings with the service context like I said this is the embedding pipeline again we don't really need to specify much here we just need to specify okay we're using open AI embeddings this is going to automatically pull in our API key which we set earlier on up here okay so it's going to automatically pull in the API key we do need to set the model so text embedding are the zero zero two at the time of me going through this is the recommended model from openai and we have our embedding batch size so this is one important thing that you should set basically it will embed things in batches of 100. I think by default the value for this is much smaller 32 or 16 or something like that so that basically means if ever it's gonna take 16 chunks of text it's going to send them to open AI get the embeddings and then it's going to pass on to the storage context and helps others to Pinecone but what we've done here is set the embedding batch size to 100 so it can take 100 send it to open AI Denson's Pinecone that means that you need to make less requests I it was it's like um six times less requests if you set this to 100 which means in essence you're gonna be roughly six times faster because the majority of the wait time in these API requests is actually the network latency so it's making the request receiving the requests so by increasing that batch size it means things are going to be faster which is I think what we all want so yeah we we said that and then with that we initialize our service context right so embedding pipeline or maybe we can even think of it as a pre-processing pipeline for our data and then we just set everything up together right so our that's our full indexing pipeline we can initialize that now this can take a long time and unfortunately alarm index doesn't have like a progress bar or anything built into it but we can actually check the progress of how index creation then we go down to so our llama index intro here we can go to index info and then we'll see the total number of vectors that are in there okay and can we also you can also see the rate of them being updated as well and you can then refresh and you can see where we are okay so we're 4.3 000 and we need to upset how many quite a few actually so it's going to take a little while I might do is just stop that for now and we can just jump ahead and begin asking questions so I'm not waiting too long for that but yeah that's just one of the unfortunate things with alarm index but we can kind of get around that by just taking a look in the in the actual pine cone dashboard at how many vectors uh we actually currently have in there okay so yeah let's sub that right now it is very slow to do this with with llama index if you're just wanting to get your vectors and documents in there I would just use Pinecone directly it's a much faster I mean for more 18 000 16 000 whatever that number is you're going to be waiting I don't know not long like maybe a couple of minutes at most because you you need to embed things with open Ai and then send things as Pinecone yeah a few minutes if you if you set that code up properly but anyway that does mean that we wouldn't benefit from the other things that Lam index offers so in some cases it might just be a case of being patient but that alarm index in the embedding process will be optimized in in the near future so that hopefully that will not take quite as long to actually absurd everything so from here let's pretend we've upset everything and now what we want to do is build our query engine so the query engine is basically it's just the index and we have this method called as query engine it basically just reformats that into something that we can begin querying okay so we have our cry out query engine then we do query engine.query and our question is going to be in what year was the College of Engineering established at the University of Notre Dam we saw that the first few items in there were talking about the University of Notre Dame so we would expect that that will work why okay let me so it looks like that hasn't actually initialized index properly so because I kind of sucked it Midway through so what I'll do is I'm just going to take like 100 dots quickly okay so it's a bit quicker let's check okay so we still have those all those other documents in there so now let's try that okay and we get the College engineering was established in 1920. I'm sure it's one of the first items it's probably where I got the question from the universe uh yes so like question four here I think if we take a look at that so data four oops oh it's a data frame so it should be a lot foreign stupid yeah and we can have a look at the concepts okay so it's pulling this information established in 1920 Okay cool so yeah that's how we would set up with alarm index with a vector database like Pinecone once we're done with that or all we don't want to do is if you're finished and you put maybe you want to ask some more questions so obviously go ahead and do that but once you are done and you're not going to use the index again we delete the index just so that we're not wasting resources there and we can actually use our at least if you're on a free T you can use that index for something else after that so that's it for this video I just wanted to very quickly introduce llama index and and how we would use it of course like I said at the start there is a lot more to alarm index than what I'm showing here but this is very simply an introduction to the library but anyway I hope this has all been useful and interesting so thank you very much for watching and I will see you again in the next one
Info
Channel: James Briggs
Views: 38,123
Rating: undefined out of 5
Keywords: python, machine learning, artificial intelligence, natural language processing, nlp, semantic search, similarity search, vector similarity search, vector search, llama index, llama index vs langchain, llama index tutorial, llama index explained, llms, llms tutorial, vector database, pinecone ai, pinecone api key, gpt 3.5, gpt 4, langchain, llm course, langchain course, gpt-3.5-turbo, chatbot python, chatgpt, chat gpt, ai, openai api, gpt index, gpt index tutorial, gpt chat
Id: WKvAWub8VCU
Channel Id: undefined
Length: 18min 40sec (1120 seconds)
Published: Wed Jun 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.