Knowledge Graph Technology Showcase E2: Honest Review of Grakn

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everybody so welcome to the very first annual knowledge graph technology showcase you might be asking yourself why is there a showcase well i often get the question which tool is right for the job especially when i am dealing with knowledge graph and modeling questions and what i would like to do is share with you some tools that i often point people to some that are new on the scenes some that are going to be a surprise and before you ask none of this is sponsored i have not been paid to do any of these i reached out to everybody on my own time they were kind enough to meet with me and film these so i hope that these honest reviews all of these are my own opinions that i often help people with when they ask me questions i hope this helps you in your search for the next knowledge graph technology that you want to dive into a little bit deeper all of the vendors that i'm going to be talking to i have more information and their contact information in the description below and if i missed any tools that you wanted to see me review or if you have questions about the ones that we are reviewing please leave them in the comments below i and the people that i'm talking to will be able to answer those questions for you all right and so what is the criteria that i'm going to be walking through there will be a summary at the very end of each video describing the answers to each of these questions as well as a summary of any other little tidbits that we find out so the main things that i ask are what are the use cases that the tool is usually or best suited for also talking about that what features do they have to actually support those use cases that's pretty important in understanding if they're going to meet your needs the other thing i like to talk about is what kind of data what kind of format and what kind of query language does the tools support two additional things i talk about because i think they're pretty important is first interoperability if a tool is not interoperable sometimes it's a make or break moment other people don't mind if it's not interoperable so we will certainly see people on both sides of the coin in these reviews the other part is is this sas or not a lot of people that have small development teams or no development teams don't have the resources to set something up that's not sas so i will be asking these questions as well as many more so please join me in the next few episodes so with that let's check out this video's tool of choice which is alright and with that let's go kick it off all right so tomas can you walk me through what is gracken what is um so interesting about this how would you describe it if i tell um if i tell you that my father is bob if i tell you that my father bob has a sister sues them if i don't ask a question who is my aunt now given those two facts you would know of course that the answer is susan oh i love this i don't know if this has anything to do with the science that you're looking at but you know hobbes discourse analysis is talking a lot about this and that was you know something that was done in research quite some time ago but it's talking about those hops that people make in logic as they're talking to one another because we're people we can fill in the blank i i love that because now you're you're making it a little bit easier for people to get at that information yeah yeah oh absolutely and and so the the hops that you're referring to specifically if we if you think of that question effectively we're abstracting two hops which is from me to my father to my aunt to my sister to his sister extracting those two hops into just one hop now that's fairly simple to do and humans can do it and there's a lot of these hopes that we can do in in in you know common speech but when it gets to you know 10 helps or 100 hops oh yeah terabytes of data that's really where specifically the science of automated reasoning or more specifically the symbolic artificial intelligence becomes necessary to solve those problems so the the effectively the way we then build gracken is that we we build a data model which is a direct implementation of effectively a concept level schema so that's why we don't actually we don't actually expose any nodes and edges we just work we just work with entities relations and attributes that's the model that gets exposed to the developer and that gets interpreted by a reasoning engine every time you query for your data the model is also fundamentally funded on hypergraph theory and that allows you for hyper relations uh and so forth and can you define hyper relations i think some people that are going to be watching this video might not know what that is if you think of a effectively binary relation which is what you see in a triple uh in a property graph you effectively are dealing with an edge is effectively defined as a pair of of nodes right so you have always two nodes you have a start node and you have an end node hyper graph or hybrid or hyper relationship you're dealing with a set so you can have effectively an n number of nodes in one edge or in one relation it's almost as if you are turbo charging the regular triples structure that we are all more familiar with so what are some of the benefits of of going this direction instead of a traditional property graph or triple store in practice the result is that a triple or property graph is effectively a lower level data model um what we will call a concept level schema so the hyper graph is really um is really a way for us to implement the concept of level schema so anything that you can model in in effectively a property graph can be expressed in a in a richer way at a con in a concept level uh schema so the schema itself has more semantics effectively represented the way to think of it is that it's a higher level data model so what what does this thing look like how would somebody be able to uh move around it and i think that you you did mention um the the data model itself isn't exposed to developers so what is exposed to the developers how would they get um involved in something like this so currently the software you're looking at right now this is called bracket workbase expected effectively brackets ide so i can build a schema i can also visualize data and right now there's a data set on my computer running it's a financial data set and we've asked the question at the top here that says match dollar b is a bank so what we're saying with that first statement we're saying there's a variable that's called b and that will be assigned to entitytime bank and then there's also another another entity type this this dollar r which we assigned to the risk to the entity type risk score that has an attribute risk level with a value high and then what we want to say is we say that the bank plays the role of risk subject and the risk score plays the role of risk value in a risk exposure relation so this is the query that you are constructing based on the shape of the data that's underneath is that accurate that's right so so and for those graphs from unfamiliar with gracken we we implement schema constraints at the database level so we declare our schema beforehand so this must adhere to your schema there's one immediate thing to note here so firstly the two banks that have been highlighted one-end securities in rbs are effectively the two results that we're getting back and they have been connected to these three risks risk scores civil unrest or score and cybercrime now bear in mind and this is the first type of inference that greco does is that we've queried for a type called risk score however we have been returned types cyber crime type war and civil unrest entity types so these are some types apparent risk score so this is an example of bracken's one type of granite inference engine which is effectively the type inference so so when you when you're talking about that inferencing um if you're not exposing the schema to the developers um how do they know what the end result of that inferencing is going to be so what you just described as a hierarchical structure so does growth how does gracken figure out that that's a hierarchy so i mean to be clear the schema is defined by the user so ah gotcha okay okay that's helpful think think how it's sql you know you create your tables you define column names same thing with the graph um but just to go back to the example so we get back these risk scores these subtypes now the second thing that we want to know here is that these four relations are what we call inferred relations so they don't exist in the graph right they're not persistent in the database we're being presented them right now as if they are persistent however if we look here on the right we see that it says inferred relation so they are effectively inferred and now what that means is that we can get an explanation by that why they wear infer so if i press explain on this button here i see that under the hood it shows me that this wallian bank is is an owner of this energy acid akata seems to be connected in a jurisdiction i like this i mean this is this is pretty slick i mean it's it's really difficult sometimes when you are running inferencing on some of those black box systems but you don't have any control over it this seems to give you that kind of control that's pretty cool absolutely so this is one explanation only for this relation um which which shows this connection this effectively this these three hops being abstracted into this one hop but then if we look at this one here we see that so there's a cyber attack effectively on rbs and the infant shows us that attack there's this there's an identifier of that attack and rbs seems to be um subject to it but bear in mind now this relation here is is not inferred so this one shows us as a natural persistent but this one is infer so we have actually a chain rule because this one is an inference that means i can press explain again and i get the explanation for that it shows me that there's a there's a subsidiary to rbs this posti cantastart which seems to be the actual subject of the attack but it doesn't stop there because it's actually not a direct subsidiary either it's a subsidiary of a subsidiary because that one was also in infrared relation because foster kinderson is actually owned by nordless renting which is a subsidiary of rps so we we would have to go through each of the dots essentially to find that out right you're building software you're not going to use workbase to check if there's an infrared relation you're going to call the api and hey and ask i can hey is this an inference so what if you find something that's wrong how can you go back and and fix it once you've discovered an error so wrong is only relative to what you've defined it to be wrong so wrong will always be right in gracken's context so if wrong means that you've defined the wrong rule and that's because a is a deterministic system so all the inferences are always fully deterministic so they can't change like unlike a machine learning model that may be undeterministic in some of the inferences that it makes kraken is always fully explainable and deterministic given the same data and the same schema and rules i see so to summarize because the end user the the developer in the situation has full control of the schema if they put a rule in place with whatever their structure is and it's not um the one i always use is um birds have wings things that have wings can fly but what about a penguin right so if you accidentally did that you could discover that by looking at the inferred relations here but it would then be up to you to go back and fix your schema because you made that mistake exactly right and that steam is flexible so you can add any type at any point in time and also the rules all the rules are you can undefine environment on the fly and is this schema proprietary to to you reckon doesn't follow a particular standard we decided that knowing look understand what standards were available that none of them were good enough to have us effectively the right balance between how expressive is the model and how simple is it to to to to use there's a lot of complexity in a lot of the standards especially in the in the knowledge graph space and that inhibits adoption and generally when you want to have a language or technology it needs to be needs to be simple enough yet expressive enough for adoption in in most let's engineering at how the schema looks so we call this modern graph the knowledge model and so we say that the knowledge novel needs to represent type hierarchies hyper regulations and rules so everything in gracken is effectively a subtype of a thing and then we have entities attributes and relations as first class citizens in the model that means any entity can subtype each other if itself attributes in subtype itself can have a type hierarchy as well and then everything so both so all entities attributes and relations can relate to a relation through roles so even a relation can be in another relation or even an attribute it can be in a relation so when you're saying within it's it's again it's a hierarchical structure so it's not just you're you're subclassing something for no reason it actually has a logic to it for example being able to create relations between attributes is that sometimes the model requires us to model things as attributes now you know we've seen so many examples where um where a lot of our users migrated from a property ground where they modeled things as nodes but then actually those were attributes if you really thought of the model those are actually attributes so yeah that's why we say this is effective the ultimate a higher level model this builds on top of a a triple or a property graph format so this is effectively the schema language so on the left we have the graphical representation of the schema and on the right just the actual code to declare this schema now for perhaps for those unfamiliar with our notation here a rectangle is an entity a circle is an attribute and a diamond shape is a is a relation yeah we really try to be practical and simple about it and that's why you see on the right here um there's only about 15 by the way it's only about 15 keywords in the entire language in bracket and you know if we can remove things or make them shorter you know we will do that we will we want to keep it as simple and beautiful we want people to fall in love with the language and i'm really proud to say that there there's so many people that already have um and so so what we're saying here on the right is saying we're first declaring that a person is an entity they have a name so a name an attribute and they play the role of employee that is it that's all i say just to declare that there's a person entity that has an attribute and plays a certain role we do the same with the company also name and but instead of playing the employee they play the employer and then we have an employment relation that relates to those two employer and employee roles and then we also declare the the attribute type and perhaps if you're coming at it from an idea for an owl perspective question we get a lot is why don't we support multiple inheritance because brad can only support single type inheritance which you see in the in the next slide because we're defining a customer as a subtype of a person just by using the same type of notation customer sub entity was a person and also startup sub company instead of startup sub entity and now those startups and a customer they inherit all the properties and roles so i don't have to re-declare that a startup can play the role of employee so i i i see there's a lot of really cool stuff going on with this but anytime somebody is talking about not using the standards the question of interoperability comes up so is that something that your customers you have now are concerned with so for us as a company that's not very interesting to invest time in for us and you know when we qualify people that may adopt our technology and especially especially with regards to sales opportunities that's a that's a you know if to be frank that's a big no-no yeah okay sure there's a time it's it's a business decision right so if somebody's business is really needing interoperability because they have a lot of different systems and they all have to talk to each other or they have a business rule perhaps where they do have to make sure that if they have to switch over to something else in a year or two that they'd be able to do that easily so i think that's a business decision um when you get into graph space the size of data that these types of tools can handle really does vary um quite a lot so in in your um work that you're doing so far at what point does the system start to break so just to give you a little bit of context of where we are before i answer that question um so the way we started building brackets is i mean almost from scratch so building grab database itself is already a huge endeavor which is what we're doing building a reasoning and on top of that with a whole new language is is a big big engineering effort so yeah yeah we've tried to at least um you invented a language that's very difficult yes um so i don't want to speak about our achievements but you know we're very proud of what we've done and there's been a lot of effort and a lot of sweat tears and not a lot of blood so we've worked very hard to make that happen however there have been some decisions architectural decisions and especially uh up until that this point of affected performance so performance has has been has always been an important issue but um unlike some of some of our other friends in the space performance have been the single most uh important goal for us to achieve because we always knew how to effectively get it down the line so the the architecture of the current crack conversion 1.8 is based on janus graphic cassandra that's what we've taken and on top of that we built the bracket for i think for most people who know about the space janus graph cassandra isn't particularly performant with a lot of data and however because it allowed us to you know go to market um build the community prove that this is something people wanted to use it helped accelerate a lot of the development as of right now in hopefully two weeks time we've we're gonna get rid of cassandra get into jaina's graph so we build our entire entire we build our own hyper graph database as well over the course of this year and then instead we use rocks to be under the hood as a very low level persistent storage engine so you use them as a stepping stone because it was out there you need to get to market and now you're you're going that next step makes sense things that are very scary to rip out over the course of this year high risk high reward that's yes yes of performance well we'll see one or two orders of bankers performance improvements when that one's cracking 2.0 is out but in a general sense you know how big of a data set do you think works really well in the current situation understanding that you are very soon going to be um expanding into bigger and better things yeah so we've tested gracken works with a roughly a couple of billion data concepts that's fine there's the the difficulties that sometimes there may be very specific reasoning queries that lead to a bottleneck of one one very specific um inference and and and that can slow things down but that's because of that one bottleneck so um but overall kraken works fine on a large data set it's more the overall performance on read and write that we're improving and and then also the scalability all right so tomas i really appreciate you taking some time how would people get a hold of you or others at gracken to find out more so if you go to gracken.ai slash discord you'll be able to sign up for our discord we've got also a a community swag program i want to thank thomas very much and gracken in general for joining me today and go and check out some other stuff because at least it's free and opening go and find out if you like it yourself

Info

Channel: Ashleigh Faith

Views: 286

Rating: 5 out of 5

Keywords: what is information architecture, Ashleigh Faith, search engine optimization, knowledge graph, knowldge graph, linked data and KG, ontology, how to make an ontology, how to make a knowledge graph, how to make a knowledge graph and ontology, how to make an ontology and knowledge graph, what is IA, grakn labs, grakn, graql, hypergraph, hypergraph and grakn, categorical data and hypergraph

Id: wGIu1xp7z5o

Channel Id: undefined

Length: 21min 11sec (1271 seconds)

Published: Thu Dec 17 2020