Convert any Text Data into a Knowledge Graph (using LLAMA3 + GROQ)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
that's why today I'll use Lama tree and rock to be honest I'm quite impressed with Lama tree because it can identifies an extra B is a very beautiful and here we have in and then apples hey guys it's wison the B again few days ago I shared a basic tutorial on how to chat no this craft and if you still remember we use a structure format which is a c acid files from C today let's delve more into the most interesting challenge which is converting and unstructured Text data into a knowledge grow you'll find all the necessary code in the descriptions below so feel free to use with your own data and your use case first let's talk about what I mean with any taex so as we know that in this real world taex is a big Jus it comes from many sources we have the pxt files Mark files PDFs website Wikipedia and even YouTube We can load a transcript so the next question is how to load this text and using langing everything becomes easier so for loading a text file we have this Tex loader and then by PTF loader for a PTF web P loader for website and then Wikipedia from wikip media and also a you loer and if you notice all the out in here we have the same forat which is the documents so this is the first goal and once we have conert our Tex into these documents we are good to go to continue to our next step in this tutorial I'm using Wikipedia loader so a few days ago Tim Cook the CEO of Apple came to Indonesia it was an exciting news and that's why I want to know deeper about him using the Wikipedia loader you have several documents and as you can see of course I don't want to use all of it because sometimes it's not relevant for example in here is a Canadian military which I don't want it so that's why I did a simple filtering resulting this raw documents as you can see in here it's a very long T and of course our LM model can process it in one R and that's why I use recursive character T spiter to split it into several documents the next step is map reduce summarization actually is optional if you want to process all of your text then just skip this process and go to the next step however I just want to take the most important part is documents and that's why have see my prous summarization I experimented with AMA with Mr model and it took quite long time around 8 until 10 minutes for each runs and I ran for three times so I already save it and this is the result and as you can see in here each iteration result in slightly deeper results and that's why I did a little bit editing and end up with this final results you can explore more with either AMA model or how can you face APR or or open AI API anything that you want because as we know each model with results different kind of results but in my opinion this final result is good enough and that's why I'm not trying with other models coming from a data science biger we're talking about text processing or NLP the tools that came into my mind was Spacey or nltk so the first thing that I did was I sered about how to your effort this kind of VI on creating knowledge craft it turns out I found an incredible tutorial by Lin see use space llm so I followed her tutorial but instead of relying on open a API I tried and experiment with some open source model so the first one is Mr 7B inrap model setup gooda download the model it took quite long time for me around 20 minutes but at the end I ends up with crial CR when loading the files I moved the C and also face the same problems the reason is the mysty model is quite B farther gab and as far as I know spy llm is not spurting us to use a quantized model so at the end the only solution that I taught us on using a SP model like using a three billion parameters models based on Spacey documentation there are several models that are supported such as open Llama C LM and O of doly I tried at it works however I'm not satisfied enough with the result no models could extract relations and that's why I try to se for a solution and I find this G have issues in the explosions Gap repository the exact same problem and I realize that one of the developers save this so up to this point I would say Spacey llm is a good choice if you have enough from enough this to load a bigger model or if you're will to pay for opening API so you can load it to model from the API so for now I'm forgetting about using Spacey llm for this test but I try to gain inside as many as I can from the GitHub people so the first thing that I notice is about entities instructions in here they provide as a space to identify the labels that we want to get the aners that we want to get for example here we have these ingredients and equipment and this same strategies is also appli for the relation instructions if you pay attention here we can also specify the labels is flips in and visits and the last thing that I noticed from this repository is the use of the F shot proms they not only give the instructions and guidance they also provide an example for Lam for example the original taxis Laura bought a house in buston with her husband War and the first an is person GP person and relationship this Lion in my opinion those treatings are the reason why Spacey llm is better if you compare it to just the basic Spacey or if we pass directly all of our text to the lolm so we keep guidance instructions and example not letting our LMS should think by their own imagination and that's way we will get a better results dur this exploration lry model CES out and that's why I taught why to use this model to convert our attacks to the knowledge C and that's why today I'll use latry and rock it's all three very vast even we used the 7B parameter models so now let's get started since we're going to access our lamat tree model using Brock so the first thing that we need to do is to import chat crop from Lang and it also access the API key and if you still don't have any API Keys it just simply go to console.com and create an API key it's totally free once we have done the set up now I want to explain further about the process of converting our text to A know graph so if you still remember there are three things that I want to replicate the first one is to define the entities the relations and also the fishot actually I'm inspired with what mosa has done in her experiment in using schema work in dividing an entities and relationships in my opinion schema work will be helpful because it provides a with very detailed explanations and descriptions about entities and relationships for most of all cases in this world for example you have a drug data and you can use the schema to off your drug and product if you have a data set about eCommerce project or maybe if you have a very specific data about music recording you can also use this but in this case since we're talking about Tim Cook and able and that's why I will talk about person so if you see here there's lots of additives and relationships for example alumite all word many so in any types I have le person School a word company per and characteristic and relation types alumite of RI first and RAR and the next part that I wanted to explain is about the prom itself so the first thing that we want to Define and share is a system problem the key important part is in here head key must contain the tax of the extracted entities with want the tax and the provided pleas and user part the head type is the type of the extracted head entity the relation dur between the head and tail and Tails is exity and ta I've also defined to examples So Adam is a sovet engineer in Microsoft since 2009 and last year he gotten word as the best out so the head is adom and the head typ is person which from Microsoft which Microsoft Is A Comin and here Adam has as the best out anymore the next part is I Define an out person since I want output is in a least of Jon that's why I need a j h person and next is a human in here I Define attitude types relation types and and then we Define the check Dr model we choose 37d so the next thing is we want to process the clean summary Tex if you still member we have already having it here so the process we could run this line to open it and then split it by the dot and also using chain R we Define here the anti types as the parameter which just anti types relation types examples and the testt is a sentence from here so we run this one and this is the result to be honest I'm quite impressed with lry because it can identifies an C the addes and relationships quite perfect for example here team Cook works for apple and then iPod n is Prett used by apples and apples is funded by Steve W mcar is produced by so yeah all see it's pretty good but of course lolm is not that perfect right because in K can like actually he is refers to team C and then sh C which is refers to team so I don't want to have a duplicate notes and entities that have the same meaning so I still need to do little bit repos year okay so this is the final result after I did a little bit processing now the very last top is how to pun this to a cyber Cy and then insert it to our new viridity so here the first that we need to do is to accesses the xd5 with this sell and here we can see the entities and relationships chical person numers for apples CL inventory tou measures and many more which is perfect I think we want to abser the entities to our new for database but we want to make sure there is no duplicates that's why I using set and also leading sh to get the unique entities if you can in here result is like this Steve Jobs as a person and then M TOS as a p down three will screen leader as a word and manymore and then the next step is to inser this hand to our new for which we can use Query in here we identified ID entities and labels for ID we Del Pap processing to replace a space and hien and to make it lower case so we can run this and let's see the result right so here we have the ID which is Steve Jobs and Steve Job is a person and Maas which is the Pro and IDE is making T so it's good and now let's define the relationships to Define relationships we can simply use this right and then the hat and what is the relationships and to what which ises it tell so here we Define the head as the head of the entities and also the tail and we could run this if you see here we have this for example Apple operates abore and Apple car pleas B Apple has project Apple MBS project Titan has characteristic and S which is perfect so the last thing to do is to connect to our new database I good C to Connected but if you want to know how to set new for to the be please watch my first video so now let's check give me here now as you can see in here we have nothing you can check in here the no is zero and relationship is zero so now let's try to abser this now let's check and yeah it's done so now let's check to our new okay it's cool we have the word the characteristics the compies person if you want to see the whole noge craft that we have we can also use this query and boom is a fair beautiful and here we have IM and then apples the the universities the project iPhone Apple maps The Project Titan so yeah I think it's cool and that's it from today so we have learned about how to convert a roex to another craft start instructing the entities and relationships and how to P these atst relationships to cyber query and from the Cyber query to push it to our new for I hope that this tutorial is useful for you guys if you have any questions or commments just let me know in the comment sections below bye-bye
Info
Channel: Geraldus Wilsen
Views: 5,696
Rating: undefined out of 5
Keywords:
Id: ky8LQE-82xs
Channel Id: undefined
Length: 14min 3sec (843 seconds)
Published: Sat Apr 27 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.