Okay. So in this video, I'm going to
look at building a LangChain app. that can be used to
query a Chroma database. we're going to be using texts and, an
EPUB as the sources of information. So we've got multiple kinds
of documents, for the vector store that we're going to use. And first off, we're going to
basically build it in OpenAI. I'm going to walk through
doing it in OpenAI. And then after that, I'm going to
try and show you a version of how you could do this without OpenAI. and I'm not going to say necessarily
that the vision without OpenAI is better. you can look at the outputs yourself
and you can decide for yourself. So let's jump into the
version with OpenAI. All right. So, here we going to be using
just some standard stuff. You can see I'm bringing in LangChain. I'm bringing in OpenAI. I'm bringing in ChromaDB for
what we're going to be using. some other new things that
we're bringing in as well is the unstructured package and Pandoc. so this is actually used for Getting the
EPUB file and being able to interpret the EPUB file into what we're doing. Okay. So the topic that we're going to be
doing is we're building basically a chat bot where you Can ask
questions about a particular topic and it's going to use a variety of
different sources to pull those back. so we're going to have a retriever
that's going to be doing, vector store retrieval system in there. So the topic that I'm actually using,
is a bunch of videos by Ash Maurya. And his book "Running Lean". And I'm using an EPUB format. So obviously I can't give you
his book in the EPUB format. You're going to need
to find your own EPUB. Pick your own topic. Choose the videos, et cetera. the text files that we're going to be
bringing in are just transcripts from, the particular YouTube videos of him
being interviewed and him talking. So a large chunk of the information is
coming from that and the other information is coming from his book "Running Lean". Okay. So we have the standard LangChain set up. We basically putting
in our OpenAI key here. We're going to be using
chroma for the vector store. we're going to have a
standard text splitter here. I'm going to use the
chat models from OpenAI. So we're going to be using
the GPT- 3.5 turbo API here. and I'm going to be using texts
loader to bring in the text files. So you can see here once I've
downloaded those files bringing them in is pretty simple. I basically just go to that folder
and bring in all the files that are actually text files in there. And I run them through this text
loader So you can see that actually in this case, there weren't that many,
there was only three of them in there. For the EPUB, this is also pretty simple. We just load up the EPUB. So I've just loaded it into Colab and
then I'm basically just passing it into this unstructured EPUB loader here. so you can see that I'm
just bringing that in. it loads it up and then it handles
all the stuff for the EPUB for us. and we can see, then we've
got basically one EPUB. Now we want to split it into chunks. So splitting it into chunks. we're basically going to use
the same as what I've used in one of the previous videos. Where I'm going for a chunk size of 1000. Now you can play around with this a lot. this is one of the things you
want to think about, just for your particular use case, maybe you go for
a smaller chunk size, with a bigger overlap the one I'm using here. You can see, basically I'm doing
the splitting of the documents, which is the text files. And this is going to be text zero
one and the splitting of the EPUB. And this is going to be text zero two. and you see, when they come
out that the actual documents is split up into 112 chunks. And the book itself is split up
into 480 chunks in there now, because these are just text files. We can basically just, Put them together. They just two lists. We can basically add them together. And then we get, this list here
of, 592, different chunks in there. If we go in and look at these, right? these are just the documents. We can look in them. we can see that it's like a
chunk of text it's maybe got a link in it, et cetera there. This is what we're going
to basically embed. Now because we're using all
OpenAI in this case, we're going to use the OpenAI embeddings. Honestly, personally, I'd probably
prefer to use the instruct embeddings rather than the OpenAI embeddings. one of the challenges. Is that even if we're using the OpenAI for
the language model, By using, the instruct embeddings is we're not tied to OpenAI. We could always swap out that language
model, but if we've gone for the OpenAI embeddings, we can't just
swap out another embedding system without re-indexing all of our data. so even though I'm showing this here,
because this notebook is basically all OpenAI, probably I would normally go
for the instruct embeddings with this. So we've got then basically
just creating out our, database. if you haven't seen any of this
before, look at the videos before where I explain this more in depth. we've got a Chroma database. We're passing in the embeddings here. so this is the embedding
function to be used. we're getting the documents in there. We're getting the embeddings in there and
we basically persisting it to a directory. We then make a retriever. And we can test that retriever. So if we ask it, what
is product market fit? We can see that it's
returned four chunks back. but in this case we've gone and
looked at the first chunk and sure enough, we can see that it looks like. this is in the first chunk. So, it looks like how
embeddings are working there. we're going to set up our
retriever for the actual end thing to just use three chunks back. Again, this is something
you'd want to test. And see if it's right
for your use case or not. next up, we're going to
make the actual chains. So we're going to be just
using a retrieval QA chain. in this case, we're going to use
the chat OpenAI language model. We're passing that in here. We're just going to basically stuff,
the whole thing in, so we're not doing any MapReduce, we're not
doing anything fancy like that. And we've got our retriever that
we basically defined before here. Now, one of the issues that I found,
when I was first playing around with this, is that really I want the answers
to sound like they're coming from him. So if we're asking him a question
about, you know, oh, what's this, how do I do this with a startup? Or how do I do customer interviews? What's important about
pivoting, these kinds of things. you really want, at least if I
wanted in this case, the answer is to feel as if they coming from
him and not some third party. So to do that, I've basically
changed the prompt in this QA chain. So the prompt basically originally
was used the following pieces of context to answer the user's questions. If you don't know the answer, just say you
don't know, and that's all that it had. the challenge with that is that
you will often get answers where it might say, as a large language model. I, can't answer this or as
a large language model, I'm, don't find the answer in here. Whereas you really want him to answer
it and say, well, this is not what we're talking about or, I don't
know the answer to that question. So to do that, I've basically
played with the prompt here. And can see the prompt that I'm
putting in is your name is Ash Maurya. You're an expert at lean startups
use the following pieces of context to answer the user's questions
so that I haven't changed. don't make up the answer. And then I've also put in always
answer from the perspective of being Ash Maurya here. So the idea is that, with this, we
always want it to be an answer from him and never like from the actual,
just like the language model itself. All right. So we've got that. We've got some just, Helper functions for printing this out. And then we come down and we
basically run it and you can see here. the first question is what
is product market fit? it's basically done a nice job here. Product market fit is the point
where startups product or service satisfies the needs of the market. and it goes on a little bit. Now you will get different
answers, slightly different answers each time you do this. Honestly for me, I think the answers
were better when I was using the instruct embeddings rather than the
OpenAI embeddings, looking at this now. we've got some sources coming back so we
can see what chunks contributed to this. So the book was used for this,
and then also two of the videos. we used for this answer there. when should I quit or pivot? So here, you can see that it's
again, it's got an answer. it's going through this and
you can see the answers are as if they're coming from him. Right. that's what we want. That it's answering in his
voice, in his style there. Next one. What is the purpose of
a customer interview? We've got the information there. What about if I ask it his name. So he could see if I
asked, what is your name? it answers from his perspective. So it gives us, you know, he's
answering it by giving his name there. another one about interviewing
techniques is to fight that. And then finally, I want you to see
an example of where if I deliberately asked something that's kind of
off topic what do I get back? So I asked, do you like
the color blue right now? My guess is that there's no way
in any of the source material that he refers to the color blue there. So the answer that we get back, is
I'm sorry, but my personal preferences are not relevant to the topic at hand. Is there anything related to
lean startups or entrepreneurship that I can assist you with? This is exactly what I was talking about
that we want it to be like, we don't want it to say, oh, I've looked at all the
context and there's nothing in there, as a large language model, I don't
have a good answer, that kind of thing. So this is a way of to try and keep it
more on a track as if you're actually talking to the person themselves. and then we can ask finally,
what books did you write? And you can see again, he's
basically it says his name. and then I've written two books
about on lean startup methodology first called "Running Lean"
which was published in 2010. And the second book is "Scaling
Lean" was published in 2016. So you see that it's got those
from the actual, book itself there. So, this is how we
would do it with OpenAI. the next one we're going to look at is
doing it, with, open source models, and basically no OpenAI, for doing this. Okay. So this is going to be the open
source version of doing this. So you're going to see that there's
not a huge amount of changes. Basically we're just swapping out models. and the model that we're going to
be using for the language model is going to be the StableVicuna model. So you'll see if I come down here, I've
basically, the big difference is just now we're loading in the StableVicuna 13 B. Now, instead of trying to get this to
work I've gone through 10 different models on the hugging face hub. and to be honest, most
of them been total crap. So the challenge is that you've
either got models that are fine tuned for being chat models. And so they can work quite nicely
for doing chat kind of tasks. But then when they actually have
to take in some information, they don't do very well for that. And the other kind of problem that
you have too, is that some of the models require custom code to run,
which then makes it much harder to get them to be running as a
hugging face pipeline for LangChain. So that also rules out some of the
models for this particular task. I some of the models that I didn't try out
would have been the sort of LLaMA models. So it is quite possible that, the
models, like, Vicuna, uh, like, some of the Wizard models and stuff like
that, that are fine tuned, basically versions of the LLaMA models. Some of those may do well,
especially the bigger models. I also didn't want to pick a really big
model for this because I know most people won't be able to serve a big model. so I was trying to get something
really around 7 billion. in the end I had to settle for
something around 13 billion. some of the models that showed
a lot of promise where that things like the La-Mini models. but then when, actually you start
to look at the output that was coming from the questions, it
just wasn't that good enough. Or you would find that they have too
short, a context, span for the tokens. So some of those models that are
actually not bad, but they're context size is only 512. So that might be okay for certain kinds
of tasks, but this one where I wanted to have a decent prompt to talk about
the character or the person that's going to be responding as well, it
turned out that would then fill up, the amount of tokens very quickly there. So, okay, the first thing I'm
bringing doing is bringing in the StableVicuna model here. we're setting up the pipeline generation. And we're just going to basically do the
local LLM from hugging face pipeline. So I've made videos about this before. I'm not going to go too in-depth
into to these kinds of things. I then basically just test it out and
I can see sure enough, you know, it's generating out something on task. The challenge with this model
is you get a lot of these. You know, hash, hash, hash human
hash, hash, hash assistant. and in the end I have to write a
function to basically filter this out in the responses coming back for this. Now, ideally, I'd like to bake that into
the pipeline at some point, if I was going to actually use this in production. loading up the data is exactly the same. There's no difference there. I'm loading up the text files. I'm loading up the ePubs there. Doing the splitting again is
exactly the same as before. We then basically do the embeddings. Now the embeddings that I chose here
were these E5-Large V2, embeddings. Okay. So how did I pick the
actual embeddings to use? So normally I would use the
instructor X L embeddings. the reason why I decided to change on this
is because now they are number two on the, MTEB leaderboard here on hugging face. So you can see here that now
they've been surpassed by this model e5-large. so I thought, okay, we'll try that out. So I, if you look in my other video and
CoLab, I'm using the instructor XL one. I think most of the time I'd
probably use this a little bit more, but I wanted to try this one out. you can see this, the embedding
dimension for these ones is larger as well that we've got there. So the code for basically
putting this in is quite simple. we bring this, model in. we just basically declared that it's,
you know, a hugging face embedding. We pass the model in, we pass in the
sentence, transformer embeddings here. You'll see that it actually has
to convert this, to what we want. just the way that it's set up it's
as set up as a normal transformer. So basically it converts this To
being, a embedding transformer with just adding the mean pooling in there. So that's what I would
expect it to do anyway. Normally that's what the sentence
transformers library does for most of the models anyway. we've got creating the database. nothing different there. we're still using chroma
retriever, nothing different there. The only thing here is I've
set the k=2 rather than k=3. just so that we're not using
too many tokens for this. A lot of the reason is when I was trying
some of the models with 512 tokens. Actually for StableVicuna could go
back to being k=3 here for this. so where we had the, the LLM
being chat OpenAI before. I've commented that out. I'm not using that now I'm using the
local LLM, which is the StableVicuna one that we set up, before, for this. if we look at the prompt, we can see that,
this is what the prompt was like before. And so we need to change the prompt again. so because we're not using the
OpenAI chat kind of model, get accessing the prompt is different. So this is where, how you would
access the prompt before it was one of the messages and it was
going to be in the system message. Here, we're actually putting it
all in the just prompt template. And because of that, you'll see that here
we've also got, the context and quick question actually in there together. Now we can play around with this. and actually changed this helpful answer
to be AI that, Play around with it. You might get better results for that. I've got a little function for
trimming out everything after the hash, hash, hash human in there. And then we've got our functions
just for processing the response. And then we actually can actually use it. So here we can see that okay,
what is product market fit? and it's giving us a reasonably
decent answer for this one. And we can see the two sources
that, that came back from. next question, when
should we quit or pivot? again, we're getting reasonable answers. That they certainly make sense. They're probably not as long
and, as the OpenAI answers. This is partly because we're only
getting two contexts back rather than the three context back. And you'll see, for some of these answers,
they kind of, just don't do that well. So here is what is the purpose
of a customer interview. Okay, so it does, Give us an answer to
get, you know, feedback for who actually might buy, but then, unhelpful answer. Why did it generate that to
validate your business model? it gets the name right. So it's got the personality a little bit,
but you'll see later on, it's not good at saying no, whereas the OpenAI models
are better at understanding when they should, uh, stay away from something. this one, do you like the color blue? Yeah, I love the color blue. It reminds me of the ocean right. Now, that's clearly coming from
the language model and not coming from the actual, the actual source
material that it's getting it from. We can see it pulled out the same sources
before, but before it obviously went through those sources and worked out, no,
there wasn't anything relevant to that. And it gave us that, bit there. So this is a good example of
the model hallucinating more than the, OpenAI model there. Another example of that would
be this where we're asking, what books did you write? we can see that, here it's basically
made up some books rather than actually get the results back from here. It's actually making them up there. So play around with the models that
the real answer if you have to do this for a open source thing you really
probably want to fine tune the model. And maybe that's something we
can look at, using a fine tune model for this particular sort of
retrieval or we'll often call this like a rag model where it's like a
retrieval augmented generation model. And that that means that you can fine
unit to actually be specifically for that. So most of the open source models that
we see on hugging face at the moment There, even the ones that are fine tuned,
they're fine tuned for instruct or chat. So they tend to be very good at doing
instructions doing that kind of thing. They're not good at using tools. They're not good at using the
reasoning sort of kind of outputs and they generally not as good
at doing this kind of retrieval augmented generation responses here. So anyway this gives you a good
example of OpenAI versus open source. perhaps in the future we can also
look at doing something like this with PaLM or with some of the other
big commercial models to see how well they do it the particular task to. a lot of this will come from you
selectively playing around with the prompt to get the kind of
prompt that you want for this. Anyway as always if you've got questions
please put them in the comments below. if you've found the video useful
please click like and subscribe. i will talk to you in the
next video .Bye for now