RAG with a Neo4j Knowledge Graph: How it Works and How to Set It Up

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on everybody welcome to another video and today we are talking about rack retrieval augmented generation with a new forj Knowledge Graph now at first I will give you a short introduction what R is and how it works before I show you in a quick code example how we can set it up on our laptops using the Gen AI Docker stack now let's get started it's going to be interesting let's kick this one off by understanding what rag actually is now llms are trained on publicly available data and therefore can answer Common Sense questions for example list three ideas for a New Year's party theme now the AI would reply sure here are three ideas however let's assume you have a large neo4j Knowledge Graph containing information and relationships which you don't don't want to publish on the internet and therefore the llm cannot be trained on this data for example we have a NE forain Knowledge Graph here and it contains documents for example the deployment model of our organization and how to retrieve your security credentials the llm has no access to this data and can therefore not answer related questions so if the user asks who do I need to contact to get my database password the llm would basically reply nothing meaningful somehow we have to transer the knowledge stored in the knowledge graph to the llm to enable it to answer questions so and that's where rag so the retrieval augmented generation comes into play let's say we have a NE forj Knowledge Graph containing unstructured data for example text and relationships between between the documents as we've seen before we could use tools like Lang chain to build powerful gen AI transformation chains for example retrieve the text representation of nodes embed the text documents using an embedding model and retrieve vectors for the documents and finally store the embeddings as node properties in our graph finally we can create a vector index to allow for similarity Vector search in our graph database now we have Vector representations of our nodes in the ne forj Vector store so the vector index enables us to perform a similarity search on vectors efficiently so for example if we had a query vector and we performed a similarity search with the vector store the vector store would return three or five or k similar vector vors by comparing these vectors from the vector stor now the question is how does this help us with our problem now having the vector index with the embedding vectors in our new forj graph we can build another length chain to provide context to the questions we asked to the llm so if the user asked who do I need to contact to get my database password we could embed this question using the same embedding model as we used to embed our nodes and therefore retrieve a vector which represents the question in this embedding space we could use this embedding Vector of our question to query the vector index which would give us back for example three similar notes in our graph together with a similarity score now we could use these three notes that have been returned by the vector search to query the knowledge graph itself on a structure so the relationships in a plain Cipher query to retrieve even more context from our knowledge graph now this would yield a couple of documents which are kind of related to the question the user asked now we can augment the initially asked question by the user with the information retrieved from the knowledge graph and then send this augmented question to our large language model to actually get an answer which is based on the context we have provided and therefore on the internal documents so for example the llm could answer ask Laura from the infrastructure team and that's how the rack the retrieval augmented generation works now I would like to give you a short demonstration how you can set this app on your local laptop by using a repository called D gen AI stack now this um repository gives us a quite complex stack for such gen purposes and it orchestrates multiple Technologies so it uses an llm provider in my case I will use um ol Lama and for Docker uh for Mac I have to install oama locally on my machine because Docker for Mac lacks support for the GPU usage and therefore it has some limitations on the performance so it's documented here in the repository of the ji stack aside from that the geni stack starts a new forj database and has implemented some applications for example a data loader which can load data from GitHub so it can export questions from GitHub and load these questions into a new forj model it also provides us with a chat front end where we we have a simple toggle whether we wanted to use rag or not and therefore in the background it orchestrates various types of L chains which we are going to see in the code later on now to get this running I have installed ol Lama on my Mac um and if you would like to do so as well I can recommend you to go to the O Lama website which makes it quite easy to install AMA now if I go to the terminal I can simply type Lama serve and it will start a local server which we can connect to to interact with the llm secondly I have cloned the Gen iack repository and the first thing we have to do in here is to take the n. example file and create a end file in the project root directory and the naming for this file is quite important because the name of the file is hardcoded in the python applications now within theend file we can set a couple of configurations for the Gen stack the first one would be which llm to use in our case we use Lama 2 which I have installed in my o Lama installation as well secondly we can specify the embedding model which will be used to embed our questions and the data in our knowledge graph into a vector form finally we can specify new forj options which I left for the defaults and this one is quite important as well the olama base URL that's how we can reach the olama service and as we saw on the command line the olama started a connection or opened up a port which is 11434 and this should be matching with this olama base URL now on a Mac we are using host. doer. internal to back reference to the Local Host on the docker host so as we are not running AMA in a Docker container we have to reference our Local Host using this special host name and that's it for the configuration part after having done that we can actually start the docker compost and I would like to show you the docker compost file quickly as well so here all the services are specified and we can see the llm will only be started if we pass in an option called Linux so so the llm container will only be started if you pass in this profile I won't do that because I have Ama running locally secondly it will also start a new forj database I don't want the volumes to be mounted um on my machine therefore I commented this out so the data will be lost once I remove the container and then there's the loader and the API and so on finally as I had back to my command line in the project directory I can say Docker comp compose up and it will start all of these services in the docker compose file now I now I have pulled all of the images here already because that's why it's a lot faster here but for the first time when you run this on your machine it can take quite a while to pull all of these images here now we can see that it's pulling the image the model already from AMA and we can also see that something has happened on AMA side already and then it will try to stop start the new forj database which has been started now and now it's starting the services for the bot which is the chat front end and also the loader which we're going to use to load data from stack Overflow into new for J now let's head over to the loader which we can find at local host and then Port 852 and using this loader front end and this has been implemented by the gen stack so that's nothing I have done I'm just showing how to use it we can you we can import questions from stack Overflow into our NE forj graph and here we can select a question tag and how many questions we would like to import let's use 200 for now and yeah we can also select the uh selection criteria down here but I would just start the import now in the repository we can see what's happening in the L and it's basically accessing stack Overflow um to retrieve data and then it will use Cipher queries to generate a graph structure which it will show upon completion of the loading process now here we can see that we are creating question noes and answer noes and the links between them and user notes so that's what has been implemented for us already another thing to notice about the loader functionality is that it it already embeds our text and creates embeddings with embedding properties on the notes and finally creates a vector index in newj all of that is implemented in this loader. python module now the loading has completed and it also shows us the schema it has generated in new forj for us so if you head over to the new forj browser which you can find as regular on 74 74 port and now we can have a look at the data we can see what has been created for us for example there's a question with the id9 and it's saying trying to build nested query in Cipher SQL um to find gross sales and so on and it also links to the answer which says you can simplify the query by using the new count subquery and so on and we would like to use this knowledge graph in our retrieval augmented generation now to access the bot interface we can head over to Local Host and then 8851 where they have implemented a front end where I can type a l a where I can type a question which then can be sent to the llm augmented or not augmented so I will use um this abl so we're not using any augmentation for um this question here and I can ask can I use subqueries in new forj and we'll see what's happening so what's happening in the bot. python is that it instantiates two Lang chains one for the llm only and one for rack augmented and depending on the flag we set in the front end it will use the one or or the other and these chains are actually specified in Janes in chains. Python and here we can see what's actually happening now in the llm only chain we will basically load the llm we have specified in our configuration and then send the user prompt to the llm without any pre-processing and return the answer provided by the llm in the rag augmented language chain we will basically provide more context to the question and we will do this by using the new forj Vector um with an existing index which that's a package coming from the L chain where we can specify the connection details for our new forj database we can specify an index name which is supposed to use and we will specify an embedding model to embed our question and then we we can specify which of which property of the nodes should be returned together with a cipher query which we will use once we have performed the vector search so we would use the embedding Vector to perform a vector search in the vector store of Nea J and then we will see the results from the vector search as parameters to our Cipher query um with note and score so we get a two bus from the node and the score and we would try to find these um nodes the graph and try to find some answers some context to the notes we have retrieved before we then return the context which we will give as plain text so down here in the return statement of The Cypher query we basically composing the text from the context that we have found and that will be returned and the question will be augmented with all of that information and then sent to the llm to be answered now once we head back to the chat interface here the llm has created an answer for now I would like to ask the very same question with rag enabled this time now we can see in our chat front end that it has generated a different answer and the answer is much more precise as it has been before that's due to the fact that we have much more context available for and answering this question by the llm so this has been only a short demonstration about how to use the Gen stack I think the repository is very very helpful and thanks to all the contributors to um develop this for us we can definitely play around with this and create different use cases similar to this um if you would like to see more content like this please leave a like or a comment and see you next time in the next video bye-bye
Info
Channel: Neo4j
Views: 29,964
Rating: undefined out of 5
Keywords: neo4j, graph databases, graphs, nosql
Id: ftlZ0oeXYRE
Channel Id: undefined
Length: 15min 56sec (956 seconds)
Published: Mon Dec 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.