Graph RAG works great, but there was
one major issue and that is the cost. Microsoft just open sourced
GraphRAG, a system that they presented almost a year ago. This is a groundbreaking system
that combines knowledge graphs with Retrieval Augmented Generation or RAG. And the goal is to address some of the
limitations of the current RAG systems. The code is available on GitHub
and you can start using it in your own projects right now. You can use this with both
proprietary models like GPT 4 and local models like Lama3. In this video, I'm going to show you
how graph RAG works and then guide you through setting it up on your local
machine to run some example tests. We will also take a look at
cost implications of a run. But before we dive into graph RAG, let's
first understand the motivation behind it by looking at traditional RAG approach. Traditional RAG is a method where
the language model retrieves relevant documents from a large
corpus to generate more accurate and contextually relevant responses. There are three steps
and here is how it works. In the first step we process the
documents and convert them into vectors. So we take our original documents,
we divide them into sub documents using a chunking strategy. We compute embeddings for each of the
chunks and then we store the chunks plus the embeddings in a vector store. That becomes our knowledge base. The next phase is query phase where
the user asks a question, we compute embeddings for that query, then do a
similarity search on all the vectors that are present in the vector database
and we retrieve the most relevant chunks or sub documents from our vector store. Then we combine the query plus
the retrieved context, give it to large language model
to generate a final response. As you can see, there are three
major limitations with this approach. The first one is limited
contextual understanding. So RAC can sometimes miss the nuances
in the data due to its reliance on retrieved documents alone. It doesn't have a holistic overview
of the document, so it doesn't really understand the overall picture. Now there are scalability issues. As the corpus grows, the retrieval
process can become less efficient. And there is associated complexity. So integrating external knowledge
sources in meaningful way can be complex and cumbersome. And with GraphRag, Microsoft is trying
to address some of these issues. Along with the code, Microsoft also
released a highly detailed technical report or paper titled from local
to global, a graph rag approach to query focused summarization. In this section, we are going to look at
the technical details of how this works. If you are just interested in using
the package, skip to the next section, but I highly recommend to stick around
for this section to understand how this whole thing actually works. So here's a quick representation of
the approach in the form of a flowchart that I created with the help of cloud 3. 5 sonnet. Now, just like rag there are
two different parts or phases. One is the indexing phase in the
other one is the query phase. During the indexing phase, we take
our original source documents. and convert them into sub documents
using a chunking strategy. This step is very similar to
traditional rag approaches, but then within each chunk, we try
to identify different entities. Now these entities can be
people places, companies, right? Depending on the context
that you're providing. And we also look for relationship
between these different entities across different chunks. So we. do two parallel things. One is entity extraction and
then relationship extraction. And we use that information
to create a knowledge graph. Knowledge graph is basically a
set of nodes that preserves the relationship between different entities. Now, based on the knowledge graph,
we create communities and I'll explain the step and a lot more
details in the subsequent section. But this is basically we detect
entities that are closer to each other. And then we describe the relationship
between these entities or communities using different levels. So in the paper they talk about three
different levels of communities, and I'll explain what those are. But for each one of those,
we create summaries. So think about it this way, that we
basically look at a set of chunks. Create summaries for those and then
combine it with Another set of chunks using reduced map approach create
summaries with those And so on and so forth until we have a holistic
overview of whatever is in this set of documents Now during the
query phase we take the user query. Then we select the community level
Basically, what level of information or what level of details we want, and
then think about this is again like a retrieval process that you're doing
on chunks, but rather than chunks, now you're doing it on communities. And we look at summaries of the
communities that will generate partial responses for us. If there are multiple communities
involved, then we combine those responses into a single response. And that is going to be the
final answer from the model. As we will learn in this video,
graph RAG is awesome, but there are still use cases for traditional RAG
systems, especially when it comes to the cost of running a graph RAG. If you want to learn about RAG beyond
basics, I have a course dedicated to the topic, in which we start with basic
techniques, and then we go into advanced techniques of building robust RAG systems. If that interests you, link
is in the video description. Now, back to the video. I hope this gives you a very good
understanding of how GraphRag works. Now let's set this up on our local machine
and we can start experimenting with it. They have provided very detailed
instructions on how to get started, so we're going to be using those. So first I'm going to create
a conda virtual environment. I'm going to call it GraphRag. And then we need to activate this
virtual environment, so we're going to just use conda activate GraphRag, and
our virtual environment is ready to go. Next, we need to install the package, so
we're going to use pip install graph rank. This is going to install the
graph rank python package for us. Okay, next we need to
run the indexing process. For that, we need our own dataset. But before copying the data set,
we're going to create another folder within the current working directory. If you can see the current working
directory is completely empty. They recommend to create a rack test and
then when that there's another folder called input, but you can essentially
provide any path that you want. So what I, what I did here was
I just created another package or sorry, another folder here. And it's basically rack test input, and
we're going to put our data in there. Next, we need a source document so
currently I think they only support plain text, and they have provided
a link to Charles Deacon's book, A Christmas Carol, so we're going to just
use that as a source of information. So if I run this command, this
will download the text of the book. So here's the project Gutenberg
ebook of a Christmas carol. I believe they currently support only
plain text, so you can potentially use something like Markdowns. And this is a pretty huge book. Okay, next we're going to set up our
workspace variables and for that we will be using this command python
dash m then Graph rag dot index. So basically we want to create an index
out of the Data that we have provided. However before that we need to initialize
our configurations for the variable to work And then we provide the root
directory where the data is stored. So when we run this, you're going to see
that it's going to create a whole bunch of different files in our current workspace. Okay, so we can see that here is the
input, but apart from that it also created an output where we can see a log, but it
hasn't really run the indexing process yet because we need to provide our LLM. It also created different prompts,
so these are the prompts that it's going to internally use to
create this knowledge graph for us. And these are basically the
prompts that they have set up. Now, there has been a lot of discussion
regarding these prompts these are very comprehensive prompts, so it
basically uses these prompts to not only extract different entities
from the provided corpus, but also creates the communities as well as
the summaries for each community. Next, we need to provide
our graph API key. This is basically the OpenAI API key. So you can select your OpenAI
model and provide that in here. Now you also have a settings. yml file. This is where you want to
set different configurations. For example, we set our graph API key. So it's going to get the
information from there. We want to use the GPT 4 O in this
case because that's faster and it's going to hopefully cost us less. You can also set the maximum number of
tokens that it's going to process, right? There are a whole bunch of
settings that you can do in here. And if you were to use a local
model such as the one that you are serving through OLAMA, You
can also change the base API path. So the URL so for example, if you were
to use grok that is serving lemma three. You will just provide that base URL here. Now for embeddings it also
is currently using the open AI embedding the small model. But you can change that if you want,
if you want to use another provider. Currently the chunk size
is limited to 300 tokens. We can play around with it, but we're
going to just go with the defaults. And there's going to be an
overlap of a hundred tokens. Thanks. Now as I was showing you different
prompts so for example, for entity extraction, here's the prompt
that it's going to be using. You can modify these prompts based
on your own needs which I highly recommend to do because that will
give you a lot more control compared to whatever is there by default. Okay, so with the previous command
it created the structure for us like how The different parameters are set
but now we need to Run this in order to actually start creating the index. So instead of initializing it. We'll just run the index creation
process this is going to basically Go through the whole document identify
different entities that are present in the documents or the corpus and
then Create relationship between those create a knowledge bank graph,
then create communities on top of it, and then it will create a
summarization of different communities at a different level. So this process can take some time. And I also want to see how much this
is going to cost us, because cost is definitely a factor because you
are not only running the embedding model, but you are also running this
entity recognition step as well as the community summarization step
that involves the use of an LLM. Now in this step it's actually currently
doing the summarization description. So the index creation process is complete
and then you can look at the output. So we're going to look at different
artifacts that it created. So these are just the database
files that it created. There is a JSON which keeps
track of different stats. So for example, total runtime,
that's the number of seconds it took. So about two minutes. There was a single document, right? So you will get a whole
bunch of information here. And then there is also another
indexing engine log that also describe different parameters. And now the next step is
going to be to run queries. Again, we're going to just use the
examples that they have provided. Now there are different set
of queries that you can run. So for example in order to run a
query, you're going to use Python M. That's basically referring to
the current Python environment. Then instead of indexing, you
are going to run the query. We will need to provide the
path where the data is stored. And the method is basically the
community level that you want to use. So basically, if you want to use the
root level, which is looking at all the information present in the document. Then you can use the global method. So something like this prompt, what are
the main themes in this story will need access to the global level information. So if you run this, this will just
use the global level or the top level community to generate answers. And here's the response that we got. So it says, success, global search
response, and top themes in the story. So it's transformation and redemption,
charity, and generosity, right? We are just looking at the examples
that they have provided in the subsequent videos, I'll show you a
lot more complex examples, working with different types of datasets. Now If you are looking for a specific
character within a story, then you probably want to use more local level or
lower level communities or information. So, in this case, we are using the method
as local because we are specifically looking for a single character. So In this case, it will just look
at as a community level or chunk level summaries and try to combine
multiple of them to generate an answer for this specific character for us. And then it was able to identify
a different relationships. Now you can see that a normal traditional
rag might be able to do something like this because it will simply look
at in different chunks where this specific characters is mentioned and
if they're it's describing like a relationship with another character. However, if you are looking for the
main theme of the document, that's where RAG is going to fail because RAG just
looks at the specific chunks that are retrieved during the retrieval process. It doesn't really have an
overall big picture of the corpus that you are providing. Also both for the global as well as
for the local level it will tell you where the information is coming from. So it actually cites its
sources, which is pretty neat. Graph RAG works great, but there was
one major issue and that is the cost. So for this specific example we send
a total of 570 requests through the API and we are talking about GPT 4 or
requests But for the embedding model, we only send about 25 requests, Now
in terms of the total number of tokens that were processed It's well over 1
million tokens, which comes out to be around 7 7 So we spend about 7 in total
to process this book and create a graph rack, which could be prohibitively
expensive for a large corpus of data. So this is definitely something you need
to consider if you're planning on using graph rack in your own application. I think this is substantially
more expensive if you were to build a traditional rack system. Anyways, I highly recommend
you check out graph rack. It's an innovative approach. Now, Microsoft is not the only company
that they have implemented a graph RAG. There are some other options. For example, Lama Index has their
own implementation of Knowledge Graph RAG query engine, and Neo4j
has their own graph RAG package that you can use to create graph RAGs. If there is interest, I will
create some content comparing these different implementations as well. Let me know in the comment section below. I hope you found this video useful. Thanks for watching and as
always, see you in the next one.