Type Refinery - the Next Step in Knowledge Graph Visualisation

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello to the gracken community um very pleased to be here i hope everyone's still hanging in because i'm the last session um so thanks for hanging in um so uh type refinery um it's the kind of next step in in type db visualization and it came about when i was working with bracken as it was called then about 14 months ago and i noticed something and i i had i was working on process mining for those that know my interests i'm very interested in time and process mining is a key way of dealing with time and process logs are very simple um a pattern and a common kind of gracken design data design pattern and similar to a russian doll and that we have components that are nested inside each other so a lot contains one or more sessions a session contains one or more events and an event contain one or more pieces of data so graphically what we're looking at is a nested scenario where i've got a session and i've got events and i have a directly follows relationship between them um and then right at the end i have an end event and we often in process mining we put dummy stop events to make it easy to query um and then i have multiple sessions so essentially i have objects within objects within objects and this might be a very very simple data set and i can model that very very quickly inside type db and if you want to look the actual type ql that's used for it you can see that we've got an event with a single piece of data an event name we've got a log that's got its own name and the trace are really the session what holds the session and the trace id is the session id and you can see the directly follows relationship right there so that's pretty much as simple as we can get data and if i just wanted to put in letters as the data so i've got i've got uh simply a five row matrix with the first three rows being a b c d and the next two rows being a c b d and then the final row aed and so my table of input data would look like this all very straightforward it leads to a pretty complicated uh visualization which is going to look like that and we can just do a quick example here just get the query and when i was working on this a year or so ago i would get um feedback that looked like this and my question is have i loaded the data correctly um by the way uh just so everybody can understand the power situation i'm running the server locally and quite a few things locally so but what we can see is we do get the responses back pretty quickly but essentially it comes back as a a fairly good hair ball um i can through a dent of kind of like hard work i can uh start dragging some of these out to make sense and and actually lay it out over about takes about 10 or 15 minutes to lay it out so i can see the data structure behind it but it certainly is tricky to see what's inside it and so even very simple data um it's very hard to answer the question of just how i loaded the right data and when i was beginning with crack that turned out to be the most important thing i was looking for have i put the right stuff in there and it turned out to be very very difficult and at the time i thought there's got to be a better way of doing this the other thing was i'd kind of been jealous because i've seen some quite nice layouts and as an example here's a couple of from uh literature this is the serengeti food web and what's interesting about this is the person's laid the data out and they've laid it out according to properties in the data so over down here we have a vertical row with um with horizontal groupings which represent um plants these green things are plants uh the blue things are herbivores and the red things are carnivores and what you can see is kind of like three vertical axes with a number of little horizontal axis working together so there's been some quite a strong layout in that data layout another approach to be looking in a biological network like if i got a biological network you know i need to be able to set this up as layers and try to understand it um there's also a kind of like high end property graph uh tools here's an example from sam speck and and what we see is a uh a set of sub groups and and and a question and answer process being driven through um moving on um this let's move on to my favorites which is layered viewpoints and and here's actually a journal which contains the pizzagate theory the results of it and finally the one that i'd really like to build which i decided i wanted to build which is something like this where i could show through an isometric layout and you'll notice here i also have groupings because i've got horizontal groupings across each layout so we have three layers and um then like groups within them and i'm like well why can't i do this with gracken you know what what's the problem so the more i looked into it the more i realized um that basically i needed a whole new visual grammar and when when you look at why i can lay stuff out i'm pretty good with a property graph i've got essentially two types of structures and they they work out as composite data objects so they're actually like a rich piece of data like i've got a person or a city or and you know all these properties are filled out and makes it very easy to lay it out because i can say you know use property one as the vertical axis and property two is the horizontal axis and you know it makes it all very easy and essentially they become out like many tables instead if we look at type db we've got two different structures we've got a thing that can own a sub attribute so that's a common structure where we have an entity an attribute or a relation that owns a sub attribute we've also got relations that have roles um that that lead to things so to things so in type db we don't have composite data objects we've got atomic data objects and everything's loaded in patterns so that's going to make it real hard to lay out so first thing i'm like well okay so somehow if i want to do layout i've got to form rich composite objects and to do that i'd need to form some kind of semantic disjoint groups but hey presto um the attributes in um in uh drakken are unique and by that i mean if we look here oops we look here at one of the event names here we can see that this piece of data this event name is linked to a whole load of instantiations and or uses where that piece of data is used and what that means is that visually the events which use the data are clustered around the data and so that while um the the unique attributes are a very useful thing you can't group with them so a hypothesis is that we'd need to create some kind of local copy a shadow copy if i wanted to do grouping um the other thing by me unique attributes are visually complicated because they they pull the way force graphs work with force and tension etc is when they cluster things around them so i'd actually need to disconnect them if i wanted to get some kind of layout that showed the semantics of what was going on the other problem is if i try to group things it can kind of look complex so ideally i'd need some kind of self-describing capability so if i group things it could give me a summary of what's inside that group and finally then even if i group them they're not objects so i'd need to be able to collapse those groups into objects and you know i see these are the f five of the causes of type d b visual complexity and i needed all five actions in order for visual clarity um but looking into the sub patterns a little bit more i focused in on four different types of disjoint groups i was interested in and sorry about that and the first one was let me just get back to that is where i have got a thing that owns a sub attribute and its complement is an exactly essentially exactly the same scenarios where i've got a relation that owns roles and things through those roles and so essentially in my system they're exactly equivalent both of those patterns and the sub leaves are downward edges and there's these two types and um essentially what i when when you look at group two that's really where i've got a nested group so in this i've got a thing that owns a sub attribute and that's linked by some upward edge to a relationship that has a role and that's been previously grouped so when i want to get into groups of groups and there's two essentially equivalent situations and it's really about the direction of the edge and the joints of things but the concept is i've got a main leaf with sub leaves and that also owns a sub pattern so that would be great type 2 and again this is an extremely common type of group in inside gracken type db then next what we often want to do is select groups of groups based on values so that's where i've got a couple of different groups and then i've got some value that i'm going to make disjoint so that i can select those for example a session id or whatever i want to be able to collect things so my third type of group would what i call a group of groups is where i've got these essentially nested patterns and what happens is these then have some common attribute through which they can be grouped and then finally i've got those things also grouped so i've got a main leaf with subleafs that owns groups of groups so given that what we have is essentially a group taxonomy with four different types of um groups possible that we want to look at um if we wanted to kind of visually simplify gracken and essentially there's a leaves layer where i've got groups of leaves and then i've got leaves connected to groups and down here i call these ones groups of groups and then leaves connected to groups so when you say type one to type four they're the types of groups that i'm looking at and essentially the proof is i built a little research prototype to prove that just changing the forced graph algorithms because some people said to me oh yeah just get daggery or you know you know wired or some kind of layout things can solve the problem well it's not and i can prove that um that if we follow my uh five hypotheses that it will actually achieve visual simplification and if i actually follow all five it it clearly does make a foundation for a new approach visually and also the other question is is that the taxonomy that i've put forward on groups actually accurately represents some kind of general type db semantics might not cover all scenarios but it certainly covers sufficient uh to make it useful and i'll note up front that the software is a research prototype not a production system and it's only got about 20 lines of css in it so some greater level of expertise could certainly um uh make it uh make it commercial so let's just have a look at what a structure from um from kraken might look like and i'll just run that query again so here we have a simple query um where we've got a log with a log name l1 it's got an event with event names and i'm looking for the traces and uh the first thing i wanted to say is yes i do have thieves and and we're going to start off with the slate thing internet's a bit uh stab apparently and um what we're going to do is we're going to run a query and i'm going to run that i'm just going to type that in and we're going to run that locally off of my local database and what you can see straight away is that we've come back and this is actually using the d3 force algorithm and we've got something very similar to what the gracken picture is um the difference is i've done some summarization so you can see here i've got like these little dots happening here i've solved the problem with with the uh that you guys have here um but essentially it looks very similar there's no real differences in the way that it lays out the information's not clearer i can see the clustering around the event name i can change algorithms and this web color algorithm that i'm using will slowly separate things out now this will give some visual clarity to the structure but what we're still going to notice is all of these events which essentially come from different user sessions are all clustered around the the attribute so there is no way of understanding what any individual session does here and so it it's all a bit of a wash now i'm actually going to charge out of this i'm going to change the um change the theme and the reason is is because as we get more visually complicated um well what you can see here is i've got a slightly different theme and the colors work out better once we get a few things going on so what i'm going to do is just start adding a few groups and this is what i call a shallow system in other words i look through the what we could call the apparent schema which is the schema that comes back in the query and so looking through that allows me to select any of the downward um looking things and start forming groups around them so the first thing i'm going to do is form a group of leaves where i've got a main leaf and some sub leafs and i'm going to call that an event record because that holds my event data and as you can tell my css has not always been successful with my theming so it's a little bit dark there sorry about that but i'm sure that could get fixed pretty easily now what i'm going to try and do is just define that group straight away and what you can see is that gets it goes to the server and then gets saved in inside here and i then define another group around using that group so i'm going to put the trace which holds the event data and now we've got the trace record let's make that one yellow [Music] and what we can see straight away um is here's our clustered data there's the actual event entity there's my event data which i've normally got just a letter in there that's what you do in process mining here's my array structure where this is the the the array index and this is my session id so that's my little group there oh i do have actually i did experiment with constraints and things like being able to set you know indent and level uh because this whole idea of this grouping was to form hyper cards and uh but um i've i've actually removed most of the constraints at the moment because really i just wanted to focus on getting the grouping right and we'll do constraints looking forward so let's just define that group and you can see now i've got a group i've got a nested group and we're finally going to group a group of groups and then we're going to go and look at that so what what you're saying here is i've got a trace id grouped by distinct value which is the index and when you look into this you can see that i've got index 0 index 0 they've all got the same index 0. clearly that's not a useful thing i actually want to group by the trace id because that's my session id and i can see my numbers change here and now i'm all grouped by trace id 0 and they all got trace id 0. so that's my user session i'm going to mark that and we're going to cover that orange and then we're going to save that now now that i've defined those groups what i want to go and do is go and look at the visualization of them i could have gone kind of part way through but i wanted to show people through the derivation so if we use it web cola the first thing i want to look at is this idea of adding shadows so the shadows what they do and you can see they're the these orange things here what we can see in here we're going to zoom in is that around this event name a to all the events that it was connected to i've inserted a copy a shadow attribute it's shadow because that's why it's lighter it essentially contains exactly the same value but now it's connected to the event all every event has a local copy of this data and i can actually disconnect those vents see if that actually clarify the situation if we do that what you can see is that yeah i've got some some clarity going on here you know you can see it's starting to separate i've got an event i've got a trace id 0 with and that's second in the list it's got an event and a piece of event data and i've got another trace id too so i can see it's separating the individual pieces but what you notice is horizontally as i move across it's all mixed up so it's it's certainly not a sufficient representation visually it's it's significantly simpler than this but it's not a fantastic visualization it's not uh it's not quite there yet so the next stage we're going to do is is look just at the groups so if we apply groups here look what we can see and again it all gets fairly complicated what we can see is the groups we can see them connected to the attributes here for example pull that attribute out now what i'm going to do is just disconnect the attributes because i'm going to kind of visually simplify things we're going to remove the dark red things and then that leaves us with a much simpler picture again now when i was talking before about the self-description what you can see is well i really need some way of understanding what's going on you can see that when i move over that trace record i have got a summary and auto summary of what's inside it now the interesting thing about when you get into modeling or how you summarize things what happens is the more containers it move through the bigger the elements of data that you're trying to summarize and so eventually we start off with a uh you know a piece of data has a very simple description um a group that just has two pieces and it has a uh that's analogous to um the way that i can add attributes in in in grack and or the work base currently if i move up to a more complex group um you can see that the full summary is written there when i go to our final nested group i can see that that's a session it's got four trace records and the condition through which it was um through which it was grouped or disjointed was where the trace id equals three and here we can see the trace id equals five and um the trace id there too and etc so i can kind of mouse over these things and yep there's some simplification here um it still be i could kind of mouse over this and check back with my original matrix and just go back to that original matrix and go well this is the thing that i'm really looking for can i see that structure well you know i'd give it maybe like a 5 out of 10 you know i can kind of like mouse over and see everything but it's still pretty complicated so the next stage that we want to look at is really about collapsing some of these groups to form composite objects or rich objects and um i can collapse i can collapse those groups of different levels if i try and collapse the event data um what we can do i'll just again just disconnect the attributes you can see those gray boxes have collapsed now to an orange circle a gray circle it's got all the data inside there but now it's again visually simpler if we collapse down to the next level and again we disconnect the attributes we can now see that we've got a very very simple picture indeed and that we can see that for example this has got trace id 0 that i could drag and drop these things and arrange them so i could understand exactly what was going on i'm not going to go through that level of detail where i drag and drop these things but what you can see is i've got five traces one of them's got three units what you can see is five traces and this is the one with the three units so yeah by dragging and dropping things i could visually simplify it from here but there's really no point the point is um i think i've sufficiently shown the case that visual simplification through collapsible disjoint groups into composite objects um is a way that we can uh really get at the data value of what's in gracken so a more kind of concrete you know that's all very kind of academic and you know you know i you know a bit i love my little theories so yeah i've played my little theory out let's kind of look at a bit more of a practical example and and this one's inspired by daniel because when i was showing through this and i was going oh what should i do for my demos i'm going to say there's two things about groups that i really like from here um well when i first started with the groups i didn't think about collapsing and that really that thought only really came through about three weeks ago um that's once i collapsed the groups then i realized there's two things i can do with them that look really interesting one is i can lay them out on the properties that are based they're based on them and number two is i can create table export now you know table exports maybe not that thrilling to a lot of people it's thrilling to me because that then allows me to use a whole load of standard charting tools and i said to daniel well should i do the table export capability or the um or or try and do some kind of layout thing and daniel goes uh try and do a layout so this one that i'm going to demonstrate here is really a shout out to highclean james's [Music] pikeland james's tube network system and i'm i'm actually going to go and have a look at one of the um just some of the one of the images from the um the tube network i don't know if you guys have looked through the examples but the tube network's a super interesting one um hang on crack and let me just not talk but get it uh tube network images there we go so if you haven't looked at the tube network before it's going to look something like this we've got a whole load of tubes uh tube stations with london pardon me laid out and there's a few lines which route between them and then um some data that's shown over the top of them now i can't show everything at that but i i can certainly show a first step towards doing something similar to that but from a forced graph so what i want to look at here is just i'm going to take in the first 100 train stations in london and i'm going to load that query and um i'm not quite sure why that hasn't looked it hasn't worked out start on a second let's um let's change this let's have a look at what will look like and just check that we got the database going okay oh it looked like we have oh i know the problem was that i have to change the key space whoops and so the query didn't work because i had the wrong key space i've done that before and so sorry about that load the query and um there's quite a bit of data here so just to give you an indication of how much data we're looking at so there's a hundred entities there's four attributes for each entity um so what we've got is we've got 100 loads of this i'm going to drill into it so people can see what's involved we've got a station it's got a longitude and a latitude it's got a naptan id which is how i link it to the british rail networking staff and it's got it's got a title and when you look at that i just got a lot of those i got a hundred of them there's like 900 edges i've got quite a lot going on there if we have a look at what um what crack and workbase will make out of it we can see that it'll make it'll pull the query really well but then when i go and look at the query it's again going to be pretty messy looking and the question is well one of the key problems here is well wait for that to come up but i'm going to look in at one of these things is while my data is in pieces that is separated from that so in a mapping thing i need those two linked together uh i need a composite object that has a longitudinal latitude i kind of can't have a longitude and then kind of map that without any real instructions so that that's not going to really as it is it's not going to work that well now we haven't got that back from workspace so what i'm going to do is i'm going to show you how composite groups can solve this problem so we're going to just like group the station up first by the nat tan id then the name and i'm going to add the longitudinal attitude and i will say although i did this for for for heikel and james to show them that i could produce an interface that so you didn't need the kinter stuff i didn't want to do too much hand waving um but i do have a little bit of hand waving that's going on under the surface so ideally in an interface i would be able to have a control here that would click these and it would say register these two fields as a location and make this one a longitude and make this one a latitude at the moment i've got a bit of code in there that says oh you know if it contains the string l-o-n it's a longitude and if it contains a string lat it's a latitude so that it's a bit of hand waving yeah ideally i'd have a kind of a way that i could select times or locations or other properties that i wanted to lay things out but a little bit of hand waving away for the sake of the example we're going to call this a station station and we'll make it we'll make it blue and i'm going to move the little zoom box out the way and what we're going to do is uh you can see also i don't quite have the constraints working properly you know we've got a center where this is off dent i've dented but if i do the left edge as it didn't quite pull together um so i didn't end up going forward with constraints for the demo sorry about that for anyone who's massively disappointed i'm funny constraints this is really just a grouping and showing you the power he's going to make a mess out of it kind of like similar to what this is made out of it which is way um so yeah that that you know if your job is to look through that good luck with that because that's just going to be tricky there's a lot going on there and it's real hard to understand what's going on [Music] as you can see i can't make it much better but what i can do is i can group it now the web cola algorithm centralizes it all so it kind of makes it a bit annoying but you can see that if i show the groups it's going to take a little while because there's a hell of a lot of things going on [Music] what you can see is that the groups are still collected to all for all the um the attributes so i'm going to disconnect the attributes it's going to usually simplify it and what you can see is that well let me get the right scale each station is now grouped right now yep that's great i've got the composite object but it's not massively simpler let's collapse those groups into stations it takes a little bit of processing this might take a minute or two because there's a hell of a lot of objects going on and what we what we find here i can also disconnect the attributes okay well my main demo would be working for some reason then i had a little map button and essentially what it does is just draw them out between minimum longitude and minimum uh maximum longitude but i seem to have a little bit of a error going on must be um because i can't see the map button so we could just run that again see if it comes up maybe it was the way let's just try and collapse that again if not you're going to have to just take my word for it that i can collapse the groups into stations and then apply that longitudinal attitude but certainly that map has not turned up so sorry about that i can't finalize the rest of the journey but what you can see each of these stations have the detail you can see this is a high street kensington underground station it's got the neptune id it's got the longitudinal latitude so you can see that these are actually um mappable objects and you can take my word for it but although the last part of the piece the demo uh broke which is laying them out on a longitudinal attitude between the minimum and the maximums are broke you can see that the evidence is here that visual clarity can be gained by using disjoint groups so i'm just going to go straight into my summary because i think everyone's probably bored by now and so i just kind of wanted to say that so basically i think we've proven that the disjoint groups can map the type db semantics um the shadow attribute adds clarity but it's not sufficient by itself disconnection of attributes is important but being able to describe the groups turns out to be a key thing but it's really then about collapsing the groups that then enables things like longitudinal attitude to be done and you know i apologize about the demo it's always the waste demo i must have tried it like 30 times for this thing anyway who knows why um but i think uh i've i've demonstrated that uh there's a foundation here through which we could visually clarify type db and data in quite a powerful step and although it's only a kind of like foundation there's there's a whole load of other things that you could do ideally in the future i'd like to do euler groups i've done disjoint groups but joint groups would be super useful because there's two set patterns in in [Music] in type db one is the you know the disjoint type set and the other one is the joint set and what you can see here is uh like a disjoint group here which is the square box and overlaid with that is these joint groups now joint groups are where i've got some overlap with some other group and this is a very common thing in um in in gracken so i feel that when we have the disjoint groups and that the joint groups together you can actually get some huge clarity but essentially my future is really about i've pretty much done the semantic disjoint groups the other thing is start doing the local constraints to set up fixed dimensions doing 2d page layouts like you've seen [Music] and get a table export going and then in the future we'd be looking to do isometric type layouts intersection groups and that would create quite a a powerful way of visualizing gracken anyway that's pretty much uh all you also wanted to say um thanks to doctors dr nick ferguson of mine i works for airbus he's a guru in archway and modeling and he suggested the web caller professor tim dwyer i wrote with webb kohler thanks uh professor dwight and daniel croash suggested that i i'll build this prototype for for cosmos i'm sorry if i've disappointed anybody with uh my my not excellent code quality um but you know i think i showed some interesting ideas so i hope people were quite interested anyway um i think that's about me done um i'm assuming there's no questions that everyone's gone to sleep by now so uh thanks for attending and think about uh type refinery oh yeah uh you guys can email me on modeler and on discord uh or you can hook up with daniel if you wanna if you're interested in um you know what the future of my tool is thanks very much for your time goodbye from australia

Info

Channel: Vaticle

Views: 169

Rating: 5 out of 5

Keywords: typedb, data visualisations, visualizations, knowledge graph

Id: i2q-qscmbgM

Channel Id: undefined

Length: 40min 17sec (2417 seconds)

Published: Mon Apr 26 2021