Converting a LangChain App from OpenAI to OpenSource

Video Statistics and Information

Video

Captions Word Cloud

Captions

Okay. So in this video, I'm going to look at building a LangChain app. that can be used to query a Chroma database. we're going to be using texts and, an EPUB as the sources of information. So we've got multiple kinds of documents, for the vector store that we're going to use. And first off, we're going to basically build it in OpenAI. I'm going to walk through doing it in OpenAI. And then after that, I'm going to try and show you a version of how you could do this without OpenAI. and I'm not going to say necessarily that the vision without OpenAI is better. you can look at the outputs yourself and you can decide for yourself. So let's jump into the version with OpenAI. All right. So, here we going to be using just some standard stuff. You can see I'm bringing in LangChain. I'm bringing in OpenAI. I'm bringing in ChromaDB for what we're going to be using. some other new things that we're bringing in as well is the unstructured package and Pandoc. so this is actually used for Getting the EPUB file and being able to interpret the EPUB file into what we're doing. Okay. So the topic that we're going to be doing is we're building basically a chat bot where you Can ask questions about a particular topic and it's going to use a variety of different sources to pull those back. so we're going to have a retriever that's going to be doing, vector store retrieval system in there. So the topic that I'm actually using, is a bunch of videos by Ash Maurya. And his book "Running Lean". And I'm using an EPUB format. So obviously I can't give you his book in the EPUB format. You're going to need to find your own EPUB. Pick your own topic. Choose the videos, et cetera. the text files that we're going to be bringing in are just transcripts from, the particular YouTube videos of him being interviewed and him talking. So a large chunk of the information is coming from that and the other information is coming from his book "Running Lean". Okay. So we have the standard LangChain set up. We basically putting in our OpenAI key here. We're going to be using chroma for the vector store. we're going to have a standard text splitter here. I'm going to use the chat models from OpenAI. So we're going to be using the GPT- 3.5 turbo API here. and I'm going to be using texts loader to bring in the text files. So you can see here once I've downloaded those files bringing them in is pretty simple. I basically just go to that folder and bring in all the files that are actually text files in there. And I run them through this text loader So you can see that actually in this case, there weren't that many, there was only three of them in there. For the EPUB, this is also pretty simple. We just load up the EPUB. So I've just loaded it into Colab and then I'm basically just passing it into this unstructured EPUB loader here. so you can see that I'm just bringing that in. it loads it up and then it handles all the stuff for the EPUB for us. and we can see, then we've got basically one EPUB. Now we want to split it into chunks. So splitting it into chunks. we're basically going to use the same as what I've used in one of the previous videos. Where I'm going for a chunk size of 1000. Now you can play around with this a lot. this is one of the things you want to think about, just for your particular use case, maybe you go for a smaller chunk size, with a bigger overlap the one I'm using here. You can see, basically I'm doing the splitting of the documents, which is the text files. And this is going to be text zero one and the splitting of the EPUB. And this is going to be text zero two. and you see, when they come out that the actual documents is split up into 112 chunks. And the book itself is split up into 480 chunks in there now, because these are just text files. We can basically just, Put them together. They just two lists. We can basically add them together. And then we get, this list here of, 592, different chunks in there. If we go in and look at these, right? these are just the documents. We can look in them. we can see that it's like a chunk of text it's maybe got a link in it, et cetera there. This is what we're going to basically embed. Now because we're using all OpenAI in this case, we're going to use the OpenAI embeddings. Honestly, personally, I'd probably prefer to use the instruct embeddings rather than the OpenAI embeddings. one of the challenges. Is that even if we're using the OpenAI for the language model, By using, the instruct embeddings is we're not tied to OpenAI. We could always swap out that language model, but if we've gone for the OpenAI embeddings, we can't just swap out another embedding system without re-indexing all of our data. so even though I'm showing this here, because this notebook is basically all OpenAI, probably I would normally go for the instruct embeddings with this. So we've got then basically just creating out our, database. if you haven't seen any of this before, look at the videos before where I explain this more in depth. we've got a Chroma database. We're passing in the embeddings here. so this is the embedding function to be used. we're getting the documents in there. We're getting the embeddings in there and we basically persisting it to a directory. We then make a retriever. And we can test that retriever. So if we ask it, what is product market fit? We can see that it's returned four chunks back. but in this case we've gone and looked at the first chunk and sure enough, we can see that it looks like. this is in the first chunk. So, it looks like how embeddings are working there. we're going to set up our retriever for the actual end thing to just use three chunks back. Again, this is something you'd want to test. And see if it's right for your use case or not. next up, we're going to make the actual chains. So we're going to be just using a retrieval QA chain. in this case, we're going to use the chat OpenAI language model. We're passing that in here. We're just going to basically stuff, the whole thing in, so we're not doing any MapReduce, we're not doing anything fancy like that. And we've got our retriever that we basically defined before here. Now, one of the issues that I found, when I was first playing around with this, is that really I want the answers to sound like they're coming from him. So if we're asking him a question about, you know, oh, what's this, how do I do this with a startup? Or how do I do customer interviews? What's important about pivoting, these kinds of things. you really want, at least if I wanted in this case, the answer is to feel as if they coming from him and not some third party. So to do that, I've basically changed the prompt in this QA chain. So the prompt basically originally was used the following pieces of context to answer the user's questions. If you don't know the answer, just say you don't know, and that's all that it had. the challenge with that is that you will often get answers where it might say, as a large language model. I, can't answer this or as a large language model, I'm, don't find the answer in here. Whereas you really want him to answer it and say, well, this is not what we're talking about or, I don't know the answer to that question. So to do that, I've basically played with the prompt here. And can see the prompt that I'm putting in is your name is Ash Maurya. You're an expert at lean startups use the following pieces of context to answer the user's questions so that I haven't changed. don't make up the answer. And then I've also put in always answer from the perspective of being Ash Maurya here. So the idea is that, with this, we always want it to be an answer from him and never like from the actual, just like the language model itself. All right. So we've got that. We've got some just, Helper functions for printing this out. And then we come down and we basically run it and you can see here. the first question is what is product market fit? it's basically done a nice job here. Product market fit is the point where startups product or service satisfies the needs of the market. and it goes on a little bit. Now you will get different answers, slightly different answers each time you do this. Honestly for me, I think the answers were better when I was using the instruct embeddings rather than the OpenAI embeddings, looking at this now. we've got some sources coming back so we can see what chunks contributed to this. So the book was used for this, and then also two of the videos. we used for this answer there. when should I quit or pivot? So here, you can see that it's again, it's got an answer. it's going through this and you can see the answers are as if they're coming from him. Right. that's what we want. That it's answering in his voice, in his style there. Next one. What is the purpose of a customer interview? We've got the information there. What about if I ask it his name. So he could see if I asked, what is your name? it answers from his perspective. So it gives us, you know, he's answering it by giving his name there. another one about interviewing techniques is to fight that. And then finally, I want you to see an example of where if I deliberately asked something that's kind of off topic what do I get back? So I asked, do you like the color blue right now? My guess is that there's no way in any of the source material that he refers to the color blue there. So the answer that we get back, is I'm sorry, but my personal preferences are not relevant to the topic at hand. Is there anything related to lean startups or entrepreneurship that I can assist you with? This is exactly what I was talking about that we want it to be like, we don't want it to say, oh, I've looked at all the context and there's nothing in there, as a large language model, I don't have a good answer, that kind of thing. So this is a way of to try and keep it more on a track as if you're actually talking to the person themselves. and then we can ask finally, what books did you write? And you can see again, he's basically it says his name. and then I've written two books about on lean startup methodology first called "Running Lean" which was published in 2010. And the second book is "Scaling Lean" was published in 2016. So you see that it's got those from the actual, book itself there. So, this is how we would do it with OpenAI. the next one we're going to look at is doing it, with, open source models, and basically no OpenAI, for doing this. Okay. So this is going to be the open source version of doing this. So you're going to see that there's not a huge amount of changes. Basically we're just swapping out models. and the model that we're going to be using for the language model is going to be the StableVicuna model. So you'll see if I come down here, I've basically, the big difference is just now we're loading in the StableVicuna 13 B. Now, instead of trying to get this to work I've gone through 10 different models on the hugging face hub. and to be honest, most of them been total crap. So the challenge is that you've either got models that are fine tuned for being chat models. And so they can work quite nicely for doing chat kind of tasks. But then when they actually have to take in some information, they don't do very well for that. And the other kind of problem that you have too, is that some of the models require custom code to run, which then makes it much harder to get them to be running as a hugging face pipeline for LangChain. So that also rules out some of the models for this particular task. I some of the models that I didn't try out would have been the sort of LLaMA models. So it is quite possible that, the models, like, Vicuna, uh, like, some of the Wizard models and stuff like that, that are fine tuned, basically versions of the LLaMA models. Some of those may do well, especially the bigger models. I also didn't want to pick a really big model for this because I know most people won't be able to serve a big model. so I was trying to get something really around 7 billion. in the end I had to settle for something around 13 billion. some of the models that showed a lot of promise where that things like the La-Mini models. but then when, actually you start to look at the output that was coming from the questions, it just wasn't that good enough. Or you would find that they have too short, a context, span for the tokens. So some of those models that are actually not bad, but they're context size is only 512. So that might be okay for certain kinds of tasks, but this one where I wanted to have a decent prompt to talk about the character or the person that's going to be responding as well, it turned out that would then fill up, the amount of tokens very quickly there. So, okay, the first thing I'm bringing doing is bringing in the StableVicuna model here. we're setting up the pipeline generation. And we're just going to basically do the local LLM from hugging face pipeline. So I've made videos about this before. I'm not going to go too in-depth into to these kinds of things. I then basically just test it out and I can see sure enough, you know, it's generating out something on task. The challenge with this model is you get a lot of these. You know, hash, hash, hash human hash, hash, hash assistant. and in the end I have to write a function to basically filter this out in the responses coming back for this. Now, ideally, I'd like to bake that into the pipeline at some point, if I was going to actually use this in production. loading up the data is exactly the same. There's no difference there. I'm loading up the text files. I'm loading up the ePubs there. Doing the splitting again is exactly the same as before. We then basically do the embeddings. Now the embeddings that I chose here were these E5-Large V2, embeddings. Okay. So how did I pick the actual embeddings to use? So normally I would use the instructor X L embeddings. the reason why I decided to change on this is because now they are number two on the, MTEB leaderboard here on hugging face. So you can see here that now they've been surpassed by this model e5-large. so I thought, okay, we'll try that out. So I, if you look in my other video and CoLab, I'm using the instructor XL one. I think most of the time I'd probably use this a little bit more, but I wanted to try this one out. you can see this, the embedding dimension for these ones is larger as well that we've got there. So the code for basically putting this in is quite simple. we bring this, model in. we just basically declared that it's, you know, a hugging face embedding. We pass the model in, we pass in the sentence, transformer embeddings here. You'll see that it actually has to convert this, to what we want. just the way that it's set up it's as set up as a normal transformer. So basically it converts this To being, a embedding transformer with just adding the mean pooling in there. So that's what I would expect it to do anyway. Normally that's what the sentence transformers library does for most of the models anyway. we've got creating the database. nothing different there. we're still using chroma retriever, nothing different there. The only thing here is I've set the k=2 rather than k=3. just so that we're not using too many tokens for this. A lot of the reason is when I was trying some of the models with 512 tokens. Actually for StableVicuna could go back to being k=3 here for this. so where we had the, the LLM being chat OpenAI before. I've commented that out. I'm not using that now I'm using the local LLM, which is the StableVicuna one that we set up, before, for this. if we look at the prompt, we can see that, this is what the prompt was like before. And so we need to change the prompt again. so because we're not using the OpenAI chat kind of model, get accessing the prompt is different. So this is where, how you would access the prompt before it was one of the messages and it was going to be in the system message. Here, we're actually putting it all in the just prompt template. And because of that, you'll see that here we've also got, the context and quick question actually in there together. Now we can play around with this. and actually changed this helpful answer to be AI that, Play around with it. You might get better results for that. I've got a little function for trimming out everything after the hash, hash, hash human in there. And then we've got our functions just for processing the response. And then we actually can actually use it. So here we can see that okay, what is product market fit? and it's giving us a reasonably decent answer for this one. And we can see the two sources that, that came back from. next question, when should we quit or pivot? again, we're getting reasonable answers. That they certainly make sense. They're probably not as long and, as the OpenAI answers. This is partly because we're only getting two contexts back rather than the three context back. And you'll see, for some of these answers, they kind of, just don't do that well. So here is what is the purpose of a customer interview. Okay, so it does, Give us an answer to get, you know, feedback for who actually might buy, but then, unhelpful answer. Why did it generate that to validate your business model? it gets the name right. So it's got the personality a little bit, but you'll see later on, it's not good at saying no, whereas the OpenAI models are better at understanding when they should, uh, stay away from something. this one, do you like the color blue? Yeah, I love the color blue. It reminds me of the ocean right. Now, that's clearly coming from the language model and not coming from the actual, the actual source material that it's getting it from. We can see it pulled out the same sources before, but before it obviously went through those sources and worked out, no, there wasn't anything relevant to that. And it gave us that, bit there. So this is a good example of the model hallucinating more than the, OpenAI model there. Another example of that would be this where we're asking, what books did you write? we can see that, here it's basically made up some books rather than actually get the results back from here. It's actually making them up there. So play around with the models that the real answer if you have to do this for a open source thing you really probably want to fine tune the model. And maybe that's something we can look at, using a fine tune model for this particular sort of retrieval or we'll often call this like a rag model where it's like a retrieval augmented generation model. And that that means that you can fine unit to actually be specifically for that. So most of the open source models that we see on hugging face at the moment There, even the ones that are fine tuned, they're fine tuned for instruct or chat. So they tend to be very good at doing instructions doing that kind of thing. They're not good at using tools. They're not good at using the reasoning sort of kind of outputs and they generally not as good at doing this kind of retrieval augmented generation responses here. So anyway this gives you a good example of OpenAI versus open source. perhaps in the future we can also look at doing something like this with PaLM or with some of the other big commercial models to see how well they do it the particular task to. a lot of this will come from you selectively playing around with the prompt to get the kind of prompt that you want for this. Anyway as always if you've got questions please put them in the comments below. if you've found the video useful please click like and subscribe. i will talk to you in the next video .Bye for now

Info

Channel: Sam Witteveen

Views: 14,209

Rating: undefined out of 5

Keywords: OpenAI, LangChain, LangChain Tools, python, machine learning, natural language processing, nlp, langchain, langchain ai, langchain in python, gpt 3 open source, gpt 3.5, gpt 3, gpt 4, openai tutorial, prompt engineering, prompt engineering gpt 3, llm course, large language model, llm, gpt index, gpt 3 chatbot, langchain prompt, gpt 3 tutorial, gpt 3 tutorial python, gpt 3.5 python, gpt 3 explained, LangChain agents, pdf, chat pdf, vector stores, local llms

Id: KUDn7bVyIfc

Channel Id: undefined

Length: 20min 0sec (1200 seconds)

Published: Fri Jun 02 2023