Imagine, you are a business owner you have tons
of companies documents on your computer, some meeting notes on Notion, some other notes on Evernote and messages on multiple Channel like Slack and WhatsApp but you constantly waste time
searching for information in multiple places. And a few months back you heard about people using
Chat GPT as their personal assistant - So how can help you? - Oh it's just more that everything just
feels disorganized. That's all - You mind if I look through your hard drive but you don't really see
how this can apply to you because your documents are private and may contain sensible data also
you can't just paste all of your documents in Chat GPT as you have hundreds of them in a lot of
different places. What if I told you that there is a way to build a personal assistant that was
fed with all of your company knowledge base Hey friends and welcome back to the channel I'm
Dona a software engineer based in Sydney and in this series of video, I will show you step by
step how to build exactly that and in this specific video, I will be focusing on the different
Steps and Concepts to understand the big picture. Let's start with our first concept LLM. LLM for large language models are nothing else than a really smart fancy long form autocomplete. When you
send your request to a LLM like GPT for example it will search what is the most probable answer
based on the billions of data that it was fed on and return it to you. And instead of giving you
a single word it gives you an entire generated sentences in this example I send the prompt the
mouse is eaten by into the LLM and it gives me back the highest possible answer. Now generalize
that to a whole sentence and you understand how LLM works. Coming back to a problem you could
think "Great then I can just create my own LLM with my own documents" Yes you could but that is
really expensive and it wouldn't be as performant as one that has been trained on billions of data
already and if you want to retrain an LLM with billions of data it's really really costly so
except if you Elon Musk, "Hi Elon" and you want to spend a few million dollars that's not really
a solution here and if you are Elon Musk then send me an email and I would love to interview
you on my channel so you could think "Okay then I will just paste all of my documents in the prompt,
on chat GPT for example" Well, the limitation here is that it's not scalable also a LLM like GPT3
is limited on the context length that means that you can't just give them your entire documents
and ask it to search through it. For example GPT 3.5 prompts are limited to 4,000 tokens, which is
around 8,000 words, which is around 16 Pages at A4 format so you're going to ask me okay then
what's the solution I want to see Solutions, Gentlemens, Solutions... Well, the solution is to use
multiple Concepts, such as chunking, embeddings and vector store. I will explain to you all of these
concepts in this video and let's get into it So let's come back to our original problem. I have
multiple PDF documents some text and some WhatsApp conversations for example. The first step is to
split all of these texts into small pieces of data called chunks Here I will divide all of
the texts into small meaningful chunk of data so let's say around 200 characters maximum
I will also cut when there is a new line to avoid mixing context. Here you can see for example
how I divide the universal declaration of human rights. Now that this is done, I will create
embeddings for each chunk of data and you're going to ask me "Dona what's an Embedding".
Embeddings are a way to represent information text, images or even audio into a numerical form or
also known as a vector imagine for example that you want to store the following words Orange, Apple, King and Queen. This could be store as the following: Orange (2, 2) Apple (3, 3) King (7,6)
and Queen (8, 6). If we represent this inside a 2D graph you will be able to see that Apples
and Oranges are closer to each other than Queen and King are also next to each other and that's
because they are more alike apple and oranges are fruits while Queen and King are more related to
people But now you might wonder how to generate these vectors and that's our second concept that
we call Embedding Model to transform our chunks of text into embeddings an embedding model is used. An Embedding model is nothing else than an algorithm set to transform a data into Vector. Different embedding models will lead to different results Going back to our example, another algorithm could
look, for example, and consider similarity as if the words are next to each other in alphabetical
order. In that case Apple would be closer to King than to Orange. You must say it doesn't
make much sense and you're right but that's just a simplified version to show you how this whole
thing works. In reality, these vectors contains thousands of dimensions instead of just two which
makes it way more powerful and accurate and also the embedding models are way more complicated than
this algorithm I was just showing you. Once we have transformed our chunks of text into embedding,
we need to store them and we're going to use a vector store for that. Vector stores are a place
where the embeddings are all being stored. As there are thousands of dimensions per embedding, every
vector store will have their own algorithms to store, index and search through them.
As a summary ,all of the documents that we want to add to our personal assistant will be divided into chunks of text then we will use an embedding model to create our embeddings and store them into a vector store. Before jumping into how to query this Vector store if you like this type of content, please leave
a thumb up and subscribe to the channel. Now, let's see how it works to query this Vector store and get an answer back from any question about our document. The user will write a question and we
will query the vector store with that question let's see what happens, under the hood, during
the vector store retrieval The user question will also be transformed into an embedding and now that our question is also in the graph we can easily see what are the vectors that are the
closest to that question It will return a list of relevant embeddings that can then be transformed
to chunks of text. Now we have the most relevant chunk of text related to the question. The next step is to use a LLM and give it the relevant chunk of text and the user's question to generate
a proper well formatted response to our user and that's it. There was a lot to digest in this
video so don't hesitate to go back and pause the video to understand the whole concept. If you
have any questions let me know in the comments and I will be happy to answer them. In the next
video, we will get our hands dirty and into the code to implement these solutions. You will
have a folder to put all of your documents and you will be able to ask anything about them. Thanks for watching and see you in the next video!