Advanced RAG with Knowledge Graphs (Neo4J demo)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today I'm going to share something a little different which is a demo I made and I've been showing around to a couple of prospective clients of mine recently and the demo has gotten pretty positive enthusiastic feedback so far so I figured why not also share it publicly here on YouTube and so in this demo I talk about how you can use AI language models together with graph databases and how the two have some pretty useful interesting interactions and for those of you who are less familiar with graph databases as a concept it's basically a database where the information is structured into entities nodes and then contextual relationships between those nodes and although they definitely have a lot of use cases in recommendation systems or analyzing social networks it's still a bit more of a niche use case type of database compared to something more common like table based databases or document databases but as it turns out graph databases also have a lot of potential for creating retrieval augmented generation systems or or rag systems and what that refers to is basically any situation where you are asking a language model a question and using that language model to retrieve some information from a database and then using the retrieved information to generate your final answer so that's what retrieval augmented generation or rag stands for and so in the demo I cover first of all how you can create this kind of graph database entirely from unstructured data like PDFs markdown files by feeding those documents into a language model and having the model extract those entities those nodes and the relationships and then you can use those to create the graph and finally I also share some points about what the benefits are why you might want to use a graph instead of something which is more common in rag applications currently which is to do something like a vector similarity search where basically you just use your documents as is and you create a vector representation of those documents and then the language model just tries to find the relevant document and the relevant part of that document uh for your for your question and so there are some differences between the two and especially some ways in which Vector similarity kind of runs into issues where then using a graph can help with a lot of those problems so I talk about those in the end without further Ado let's roll the demo uh hope you enjoy so here we have a graph describing partially the operations of an IT consultancy company uh we have clients in green clients have projects in blue and projects use Technologies in Orange and projects also have people in yellow who work on those projects and finally people also have separately relationships to the technologies that they use and they also send messages on slack uh talking about the clients and talking about the projects and so this graph has been generated without writing a single line of Cipher which is the data database language of Neo 4J uh instead it has been generated entirely from unstructured and semi-structured data such as project briefs like this that describe the client and the projects and the Technologies used and we also have profiles of people uh in markdown as well and finally we have slack messages uh in Json format like so and so what we are doing is we are feeding all of these documents into a language model using a prompt like this asking the model to extract entities and relationships between those entities from the documents so in case of the project briefs we ask the model to find projects Technologies and clients and given that they are all in the same document we also want to generate uh relationships between all of those and then given an output of entities and relationships like this we can then generate automatically the cipher statements and run those against NE J we have a function here for converting the language model outputs into the cipher statements that looks like this and then finally we just Loop through each file in our directories generate the cipher run those against neo4j and that is how we arrive at the graph here so the question is what can we actually do with a graph like this and I'm going to show you we have a chat interface here that's connected to the ne 4J database with a language model interface in between so taking as an example our client Epsilon Finance who has a digital wallet project using some Google Technologies we can ask a question like this and the language model will generate us the cipher query that corresponds to the user question and then we get the get the answer which is correct in in this case and what I found is that these language models especially gp4 are really quite good at generating the cipher even for more complicated questions like for example so even something like this doesn't doesn't cause any issues and then finally because we have the language model as part of our chat interface we can not only get results from the graph but we can also analyze it further as in this case for sentiments okay so there's a few more things I want to say about this uh namely to answer the question question why graphs if we have all this unstructured data for creating the graph uh why not just slap those documents into a vector index and run a similarity search uh which is the way it's often done with applications like this uh but the question for a large part comes down to the fact that graphs give much better results for what are called multihop searches where the answer to the user question uh consists of different fragments located across different documents M ments so rather than the classic example of looking for information about your company's healthare policy where the answer is typically located in one section of one document if you need to combine context and information across different documents then Vector similarity doesn't get really far especially for more complex queries uh secondly what also makes graph database great is that both nodes but also relationships can have properties that are all searchable so for example we might have a person who's related to a project who's working on a project uh but also in that relationship person to project relationship we can have additional information such as the dates uh on which the person is working in the project but we could also have something like sentiment in there if we can get that information how the experience of the person was working in the project and if we have that information then we can also search by that we can search for all the person and project relationships with the sentiment is negative for example and then that's going to return to us all those people and all those projects where that's the case and finally the graph is also a very flexible data model uh well suited for incrementally increasing the amount of data number of nodes in the graph and you don't need to just Define a schema when creating the graph database and then stick to that schema forever and just to show you what this means is that our graph looks like this after we we have ingested about half of the project briefs and then after ingesting the rest and then we add the people which is an entirely independent separate process and then finally we add the slack messages again as a independent process and so of course you need to make sure that you're processing all the incremental new documents in a way that it Maps correctly to the existing graph creating relationships to the existing nodes but from the graph database point of view there really are no practical limitations to how much you can just expand the graph add more data make it bigger and make it better

Info

Channel: Johannes Jolkkonen | Funktio AI

Views: 45,044

Rating: undefined out of 5

Keywords:

Id: nPG_jKrSpi0

Channel Id: undefined

Length: 8min 40sec (520 seconds)

Published: Fri Oct 20 2023