.NET Data Community Standup - Using hierarchical data in SQL Server and PostgreSQL with EF Core

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC] Count down beginning. Two minus two minutes. [MUSIC] >> Jeremy Likness, Shay Rojansky, Brice Lambson, Arthur Vickers. >> SQL Server, PostGreSQL, SQLite, Oracle, MySQL, Cosmos DB. >> Net Data, T-1 minute. C#, F#, VB,.Net Data, Topology Suite, 0Data, GraphQL, EF Core Power tools, Node Time. >> 30 seconds. >> Open source software, performance, System Data, Dapper, ASP.NET Core, Entity Framework Core. >> Ten, nine, eight, seven, six, five, four, three, two, one. Lift off. We are going for.NET community stand-up. >> Here we are again,.NET data community stand-up. This week we're going to talk about hierarchical data. I'm joined by Shay from the.NET DataAccess team as always, and then also Brice who has worked with the community on this over the years. He's going to talk in a little bit about the community history of this. As is always the case it's been a little bit of a scramble of getting everything ready. If there's some live demo issues, then that's the way it goes, but we'll see what happens. Let's start. As always, with the state of the unicorn. Seems like forever ago that we released these things now, but it's actually less than two weeks. EF core 7.0.4 is out there. Seven more bug fixes. We got a lot bug fixes in 7.0.3, but they're still important stuff still coming in so get 7.0.4. Actually one of those bug fixes it's a Microsoft data SQL client not in EF and we always forget about it. We'd never mentioned Microsoft data, SQL client and its Brice's thing that he created and everything. Sorry, Brice that we always just gloss over that, but there's a bug fixing Microsoft ADC client. I guess it's good that it's hardly ever any bug fixes in it, but there is one in this case. Then also the EF Core 8 Preview 2 is on NuGet, which has SQL Server hierarchy ideas, official support, which we're going to talk about today. Then also JSON columns in SQLite. This is the same support we have for JSON columns in SQL Server, now works on SQLite as well. We're still working on lots of improvements in the SQLite space. Marie, she's working hard on that, but what you could do in SQL Server, you can now do in SQLite plus because there's already been some enhancements to JSON columns support in Preview 1, and those also work on SQLite. As always, aka.ms/ef8-new, that's the watch new in EF8 and there's runnable samples on GitHub. I'll try and upload the code that I'm doing here to GitHub later. I haven't done it yet because it's been a scramble. But also the code from the stand up I'll put there. Let's talk a little bit about hierarchical data. What we mean by that. We'll get into this a little bit more later when we look at differences between Postgres and SQL Server as well. But for now we're just going to do the common denominator type thing. We've got a very famous family tree here of hobbits. I've been trying to avoid seeing trademark terms in writing them down in any of the online stuff. We refer to as hotlinks and anything that's written down. >> I don't think hobbits are a trade trademark thing. >> Tolkien Estate. Yes. >> Are you saying he invented hobbits? I'm very surprised. >> Yes. >> Isn't this an English word from the whatever? >> Oh, well, we could go into the likely history of it. >> We absolutely should. >> Anyway, if they had to cease and desist and tell us to remove this YouTube video, then we will do that. But I figured I can be a little bit more free on the video that I have been on stuff we publish on the.NET network or whatever. Anyway, what do we mean by hierarchical data? Essentially tree structures. I mean, that's basically what it means. You've got a root and then you've got nodes down in the tree going to leave. That's essentially what we mean by hierarchical data. SQL Server has a particular syntax for representing these things. Postgres syntax is more flexible. But what I'm going to do is just to make this really clear, I'm going to go through this tree structure and show the SQL Server syntax so that when we get to looking at the code, you can mentally move back between the tree image and what you're seeing in terms of what's in the code. A SQL Server, the root is just a slash. That's the patriarch of the family. Of course, this Tolkien's day, this is a patriarchal family tree. That's just the way it is. Sorry, but that's what's in the book. That's what we're using. Anyway. Now we get to the first level. SQL Server is the first level. Each level is basically prepends the level before it. The root is /, and then the first level is /1/, /2/, /3/, /4/, /5/. These are all with children on the first level. Now when we go down to the children of Mungo, again, all the children Mungo are prefix by Mungo's hierarchy ID. This is what this is. This is the hierarchy ID. The first child is Mungo's prefix /1/, and then 2, 3, 4, 5. Likewise, if we go to Ponto here, since Ponto was /3/, then Rosa is /3/1/ and Polo is /3/2/ and so on for other ones. Then likewise, when we go down to the next level, it's just the same thing. The prefix, and then you add another suffix on the end of it. Bilbo /1/1/1/, so you can tell why Bilbo inherited back-end because Bilbo is the first-line male descendant in the Baggins' family tree there. Otho, somewhat infamous and Falco, /1/5/1/, and then Posco and Prisca, I assume that you say that. Then of course now we're going down this other side of the family tree where everything is from Largo, so these are all prefixed with /4/ and we get down to here, we've got /4/1/2/ and so on. Then final Lotho, very, very infamous Sackville-Baggins there. Poppy, Ponto, Porto, and Peony, and then Frodo, as I think most people have heard of Frodo, so /4/1/2/1/. I think we've brought two more here. Then Angelica is the only fifth-level one we have right now. This is basically what we mean by a tree structure, a hierarchy. Now, I can say Postgres, there's a bit more flexibility in this, but let's stick with the SQL server convention just to start with. >> I'm going to switch to show some code. I've got a lot of code here. We're going to look at, I've got both SQL Server and Postgres examples in here. We're going to look at both how you do this in Postgres and SQL Server. Before we go into that, let's talk a little bit about the packages involved, the history of how this came about, and this is for SQL Server specifically. The hierarchy ID is actually been supported for quite some time. Let me find my window here. It's been supported quite some time by this community project. Do you want to talk about how that came about Brice and the history of this? >> Yeah, sure. Way back in EF Core 22, which was released in November 2019 or somewhere around there. Is that right? That might be right. Anyway, whenever that was, I was doing a bunch of the spatial work which was bringing in the SQL geometry and the SQL geography types. While I was doing that work, I'm like, we can easily do this the exact same for hierarchy ID, but the problem was there wasn't a hierarchy ID library out there, a special we chose a topologysly. But for the hierarchy ID I found a community packaged by Martin Nelson who actually brought in a bunch of the spatial stuff. It was unsupported though, we were hesitant to just say, hey yeah let's merge unsupported code into the main Entity Framework Core branch. I spiked down some code, I wrote down some code, and I put it out there. I don't even know if it was working completely, then I of course got distracted with other features. But a community member named Adam Jones wanted hierarchy ID support. He found it and he brought it up to speed and he got it working and end with test and so we released it. I think we even released a compatible with Entity Framework Core 22. Then we've kept stringing it along, there's been lots of contributions to it, to evolve it, to add this functionality. But it wasn't until last year, earlier this year that the SQL Server team finally released official SQL hierarchy ID support. What we did is we reached out to all the community members who had contributed and said, hey, would you guys want to contribute all of this into the main Entity Framework Core projects so it can become official and support it, now that we can have end-to-end support, they all happily agreed. Obviously, they wanted hierarchy ID support [inaudible], that's why they contributed. I just recently in time for EDO preview too. Actually merged all the code in, that's how it's finally an official package, but the package has been around for a while, it's been pretty heavily tested by the community it has some effects but the functionality is pretty well. >> Thanks to everybody who's in the community who's contributed over this, you made this happen, we really appreciate it. Also, if you're watching this and you're like I'm not on EF8 yet, remember there is a community package here, you can do this. I mean, I don't even know how far it goes back, at least three WAN which isn't even supported anywhere. Basically whatever the version of support EF Core you're using because obviously, you're not using an unsupported version. Every version of EF Core that you're using has some form of hierarchy ID support, which is basically, this may be a few differences, but basically the same as what we're going to show here, basically the same code. Let's talk also a little bit about packages for hierarchy ID. There's three levels of packages and assemblies. This is usually not that important. Usually what you want to do is just include this in your project, the Microsoft Entity Framework Core SQL Server hierarchy ID project package. That brings in everything you need to write queries to integrate with EF Core and all of that stuff. But actually, this package itself is dependent on this package called SQL Server abstraction, which is another new package. This has the actual hierarchy ID type that you can put in your entities in it that you'll see in a minute. Like all of our abstractions package, this is a very light dependencies. See it doesn't depend critically on any other Entity Framework packages. If you are trying to keep your data access code if you like out of your model but you want to use hierarchy IDs in your model, then you bring this in and you get very minimal dependencies, very minimal impact on your deployment. Then as Brice mentioned at the lowest level this is the Microsoft SQL server types package. This is the one that their SQL Server people have released to actually support various custom EDT essentially that we're in SQL Server. This is the special types of geometry and geography and the hierarchy ID type. Are there any of these Brice? >> No [inaudible]. >> This is at the lowest level thing. These things are SQL Server dependent, as we'll see in a minute, the Postgres stuff is independent of this. It would've been nice if we had some common thing between them, but there's quite a lot of differences. As Brice says it wasn't a community library, we've ended up here, it's not. A lot of the functionality is very similar, but the actual types that use are different. >> [inaudible] putting out to that the hierarchy ID type is backed by the SQL Server types library. That'll work outside of SQL Server. If you're just in a pure dot net app, you can use this and all the functionality will work. >> Yeah, it's a little bit different from geometry and geography in that all of the methods have quite implementations Brice in the hierarchy ID, so that even though like in SQL Server right now if you use one of the geography from this and then you do something, it's not implemented everywhere. On the client side it only worsens a translation, whereas the hierarchical ID is more simpler. >> It actually works on Windows, is because there's a native DLL, it's highly optimized code, they want these operations to be fast. >> But hierarchy ID is not like that, it will work everywhere. >> Pure managed, it's all running. >> Let's go back to the code then. Should we stop for any questions? Sure, I haven't been looking at the chat. >> I don't think. I think let's go see some hierarchy ID, and then there's already some questions but better deal with them after we see some code. >> Okay. In order to use, in order to get hierarchy ID support in with SQL Server, you need to do this. Use hierarchy ID thing that tells us stop, that tells us to map the hierarchy ID type and everything like that. If you're doing Postgres, you don't need to do that because Postgres has the Altria type, which is equivalent to a hierarchy ID built into the ada.net provider it brings it in any way, and so it's all there by default. I'm using User Secrets so that sure I can't see my Postgres password there. Anyway that's neither here nor there. >> Nice. >> Let's look at the little application I have here to test things out, make a clean database, and then seed it with some hierarchy ID things. If we go back and remember this here, so we had all of these, we built up these hierarchy ID strings, so this is what I'm now adding to the SQL Server database. I have a class called Hobbit that has a name, year of birth, and then in here I've got the hierarchy ID which is the path down from the patriarch to this particular item in the tree. I could make this the primary key that's valid. You can use hierarchy IDs as the primary key. You can't mutate it. EF doesn't allow you to mutate if the primary key, and so if you want to repair and pop the tree to another part, then it's nice. Not having its a primary key but anyway, that's an option that is always there. Now, actually SQL Server has this compact performance binary representation of hierarchy IDs under the hood. What we're doing here, what I was saying we Parse from this human readable string format, and to create that representation. I think in API review, we said we're going to make a constructor for this rather than necessarily having to go Parse. Really, you don't need to worry generally about the fact that brightening representation is there, just think about it in terms of the strings and everything's fine. But under the covers it's not actually storing a string in SQL Server, it's storing a compact binary representation of that thing. Does Postgres have that too, Shay do you know? >> I have no idea. It's a purely internal implementation detail inside elsewhere, so who knows. >> Anyway, so yeah, we're doing a normal thing here where I've added all these things and I'm calling Save Changes. Let's look at the Postgres version of this. Now the Postgres version of this doesn't have the same strict syntax that SQL Server does. In fact, all it requires is that you have a dot period character separating each level. You can put alphanumeric characters in there, you can do various other things and we'll come back to that bit later. I have another example down here that uses and shows a bit more of that later. But for now, just to keep things very common between what we're doing on SQL Server and Postgres so we can see the same operations all I've done is, take the same SQL Server representation and put periods in here. This can cause some issues. If you're not very careful with Postgres, because you don't have a lot of unique list like this. If you're looking at sub-tree, for example, the 1.1 exists lots of different places in here and it has different meanings, but we'll get back to that later. But just be aware that you might not want to make it this simple in Postgres, depending on your case we can come back and look at that. We've stored data, let's do some queries. I have got all the queries here cause I was rather than copying and pasting them in from elsewhere, I was getting very confused about which are my Postgres ones and which are my SQL Server ones, so I'm just going to uncomment them as we go so I don't get confused. Find a generations. Well, there's a Typhon, find a generation. But one of the first things that you can do whether or not this is common, Shay, you might say that this is a bit artificial. I think it depends on how you're conceptualizing the family tree, but you can say, give me everything at certain levels. On here, if we say this is level zero, this is level one and level two, so we're saying give all of these at this level no matter what their ancestors are, if they're at this level in the tree, give me them all. In order to do that, the hierarchy ID to by-parse this GetLevel function and you can match that to a level. If we run this, you'll see, I'm running the postscript and I knew that was going to happen. How do I? Maybe I'Il just do it in here, on SQL Server. There we go. We'll store all the data. Actually, before we look at the output of the query, let's just take a quick look at the tables just so we can see the hierarchy ID in the table. Here you can see that we've created a table and we're using this SQL server type hierarchy ID. This is a built-in UDT if that's not a contradiction in terms in SQL Server. When we run this query, and you see we get this translated and it's translated to the Where clause whereas using the GetLevel on the UDT and SQL Server, and as you can see, we're returning all of those hobbits that are at that generation two. In Postgres, let me open the oldest whitespace. It's exactly the same thing. The difference is that in Postgres, its property called NLevel instead of a method called GetLevel. Also, I think I might not have mentioned this, in Postgres the actual type is not called hierarchy ID, it's called LTree, but it means basically the same thing. >> LTree stands for a label tree, like a tree as labels. We'll talk about that a bit later, I think. >> Yeah. Running this, I want to show the table, it's the same as it was in SQL Server just with LTree is the type. But you can see we get essentially the same thing except we've translated to this NLevel Postgres function, and then we get the same results out of here. Pretty simple, but translate it to the server, presumably doing efficiently on the server, that's what the idea of storing these things in this way is that you do it efficiently on the server. We'll stay on Postgres since we're here right now. The example I've done for this hidden what's new docks I realized afterwards was much more complicated than it needs to be. We want to find all generations. I'm doing a GroupBy here, so just group by the level. If we run that, we should see that we get a group for each level. There we go, so we've got each level in the tree which is exactly this level down to Angelica by herself at the end there. Just a little digression here. This is a GroupBy as a final operator. You might notice there's no grouping and SQL, Shay. >> This is something, this query, if you tried this before EF Core 7, this would result in an error. Now, we would not translate this. The reason is previously we would only do GroupBy when we could translate it to a SQL Server GroupBy. In SQL you can't just do a GroupBy. In SQL you have to also use an aggregate function to reduce the groups back into tabular form. Relational databases cannot have groups of things like that, and then you only deal with basically with table form of stuff. But what we did for EF Core 7 to make this very popular query work is basically, as you can see, we translate it to something that doesn't do a GroupBy on this server. We get back all the rows exactly as they are, but we group them for you on the client side. Notice that there is still an order by there. We order by the key instead of grouping by key, and then we get basically all the rows for a certain key value and then all the rows for another key value, and client side inside EF Core, we pack them for you into groupings and give you access to that thing. The key point about GroupBy is when you think about it, just putting a GroupBy doesn't reduce the amount of data in any way. It just packs the shapes differently. Instead of giving a lot of rows, you get the exact same rows, but grouped in some way. The same amount of data is basically coming back from the server, which is why we think it's okay to allow you to do GroupBy like this, so there's nothing here in terms of performance that would be bad. You're just reshaping your data, which is why this is fine. But it does not do GroupBy in this N SQL, which is a little bit weird. >> But it's not inefficient and it's not bad client evaluation or anything like that, it's all fine. >> I like to think of it as to dictionary or to illustrate. Those don't really translate to the database, but on the client we can shove it into that shape. >> Yes, exactly. >> That's very true. >> Yeah. >> Let's go back to SQL server and we can do the same thing on SQL server just for completeness. Find all generations, again, it looks exactly the same. Tap that, we're doing a GetLevel boat this GroupBy in the same way and it will translate or I'll probably run SQL server. How many snaps? That's good. We're doing an older by here in the same way. Translation is the same, Postgres and SQL server in that respect. Let's do something maybe more interesting. Or should we stop for questions at all, Shay? >> No, let's go on. Let's keep going and just show code. There's very legitimate questions of why would I want to use this as opposed to for example, like foreign key constraints within the table like so, and all that stuff is very important, but I think now we're just looking at what this does. >> Let's keep going on. I want to find all the descendants, given Mungo. Let's find Mungo, where is Mungo? mungo there. Mungo has got big loaded descended. I want to find all of these descendants of Mungo. Well, SQL server has a function on a hierarchy code is descendant off. We can use that, it will tell us, the hierarchy that you're looking at is a descendant of the one that you pass it. You know what I mean? Anyway, I mean backwards when I was talking about it. Interestingly, and I don't know if anyone check and confirm this, maybe this is a graph theory thing or something, I don't know. But Postgres and SQL server recognize yourself the thing that you're testing as being both a descendant and an ancestor of itself. If I asked for descendant of Mungo, Mungo is considered a descendant of Mungo, which to me was a bit weird. Anyway, I didn't want my result aside, you started this other course which is like, he got to return me and don't return it. I don't want that one. Let's just run this and we can see the query. Here we go. Descendants of Mungo and here they are. You can do orderings and stuff to get them in the appropriate levels and things like that. But I haven't done that right now. In some of the examples in the What's New, I've done some ordering things there. But basically it's just calling this function IsDescendantOf. If we go to Postgres, we have essentially the same thing here. In fact the method is even called the same, IsDescendantOf, so that looks basically the same. If I run it, the SQL looks a bit different, when we're doing Postgres and SQL server, because SQL server, it's implemented in the database as a UDP type, because the methods look the same whereas Postgres, there's more of a translation from methods in the.NET world to function in Postgres. We see that here, where in Postgres they actually have this funky operator here. Where you're saying, this is basically is to send it off, that's what this operator means. >> Postgres loves operators. Everybody who uses Postgres with the more advanced features know that there's a ton, there's like a zoo of operators with all kinds of characters and it's all great fun. But the story behind this is that there used to be a limitation in Postgres. Where if you wanted something to be sped up by an index, what we sometimes called searchable, then it had to be an operator. You couldn't have a function, though it was just engineered that way and I heard that went away that limitation. But that explains every time they wanted to have an operation that could run quickly like a contains of some sort, then they couldn't introduce a function because they wouldn't be able to speed it up, so they invented some operator and it's really funny. You even have operators with three characters. Because the characters aren't enough, so it gets really wild, anyway. >> The obvious thing to do next, we've looked at all the descendants, let's find all ancestors. Longo here. Where is Longo? Longo is there. So all ancestors would be Mungo and Balbo. Postgres, just like it has its descendant of, has its ancestor of, again filter out because apparently Longo is an ancestor of himself. Anyway, this will similarly translate to an operator on Postgres, but a slightly different one, only slightly different, that was put in the other way. But it all makes sense obviously. Going back to SQL server. SQL server house is descendant of, those in half is ancestor of. But that's not really a problem because if you want to find the ancestors as opposed descendants, you just switch the operands around. Up here we were saying, where the H1 path is a descendant of, and down here in order to find ancestors we're saying where the path of the thing that we're looking for is a descendant of the thing we're passing in, does that makes sense? All I've done is switch the order of the operands, and now I've got the equivalent of his ancestor. Of course this would work on Postgres as well, but Postgres has the operator and therefore the method to do it. It's perhaps a little clearer on Postgres, but it's not a difference in functionality at all. If I run this one, then you'll see that we'll get a query translated as appropriately for SQL server. There we go. Again, we've got his descendant of in there. I think we're going to start getting a little bit different between Postgres and SQL server next. We've been looking at all the descendants. If we were looking at Mungo, the direct descendant of Mungo is, Mungo only has one descendant, so perhaps not the best one. Let's change this to Longo, that will be better. Now I'll find now my query is wrong. No hang on. That's ancestors, never mind. >> I wanted to say something >> We're looking for direct descendants of Mungo. What we want is these five, Bungo through Bingo, but we don't want Bilbo, Otho or Falco, so there we go. SQL server has this nice function, GetAncestor, which basically says given this, give me the first ancestor. If it's two, give me the third level up. We can use this for both finding a direct ancestor by switching around, like we said before, or finding direct descendants. So I can write a query here in SQL server that will quite easily find me only things that are they direct descendants of Mungo. We'll run that just to prove we're not lying. Yes, we don't have to run everything, but it's nice to show the code actually worked and look at the query. There you go, so we're doing GetAncestor there and we can see we've got Bungo through Bingo on there. Actually, let's find direct ancestor in SQL server first as well, just so we can get out of the way because it's basically the same thing but with, again, the GetAncestor goal on the other side of it. In this case we want to get ancestors so we put the GetAncestor code on that, operate that side of the equation rather than that quite. In this case, I guess I already change it to Bilbo, we can get all of the ancestors of Bilbo, we should go all the way up to Balbo there. Oh, no, direct ancestor, I don't know what I'm talking about. The director ancestor of Bilbo is as we know, Bungo. Was it Bungo guy who built back end? I can't remember. You don't remember? I think it might have been. >> These ancestors would be very ashamed of your understanding of their family tree. >> I think would be, indeed. Let's go and look at this in Postgres. I'm actually going to skip down a bit first, because I've got a way to do it here and then we've got actually a different way to do it. The different way to do it looks very similar to the way we were doing it in SQL server. Postgres has a lot of these methods, let me actually bring that up real quickly. On the Postgres documentation here, also looking the translation. Postgres translations, these are the translated functions for the L3 mapping. >> I hope everybody can see those funky operators on the right, that's what I was talking about before. Look at that three character one, question mark at sign greater than, isn't that great? Isn't that beautiful? >> Anyway, there's two different sub-path methods here, that basically go into your, if you think of a string representation and they'll strip off this bit all or strip off that bit all, you'll find the bit in the middle of it. You can do manipulations of the position in the tree by manipulating the path in that way. That's what I've done here. Instead of doing GetAncestor, I've done sub-path and then take the thing off the end. That gives me my ancestor. Then this looks basically the same as the SQL server query did with GetAncestor. Let's run that. We should see something with some funky operator again. >> Oh no, it's not Funky Operator. I made debugging. >> Yeah. You chose debugging instead of output. >> Output. There we go. It's not Funky Operator. I don't know what I'm doing now. Anyway, am I actually running in the debugger? Is that what's up? >> No, you're okay. >> Let me run again. I need to get back into where I was. We don't actually have an operator here. It's actually a function subpath, so it looks a bit more simple. But you can see we're passing in 0 and -1. This works. Bungo. My experience of these things though, the subpaths of trees, things that you have to be very careful for two reasons. One, because it's easy to get something that matches but is actually matching the wrong thing, especially if your hierarchy needs are in the 1.2.3. Remember I said you could do a subpath and it gets 1.1 and it finds 1.1, but it was actually because you've got the 1.1 of the end instead of the 1.1 of the beginning. When you run it with another value, gives you the wrong results. That I found quite hard to get it right. The other issue is they're not very forgiving. Our GetAncestor is quite forgiving. Postgres. For example, if I wanted to use this subpath thing, as a way, you could do the same thing that I did on SQL Server thinking, it's just the same as I used there, so this subpath is GetAncestor. This looks fine to me. I mean, it's doing the same thing. It's saying, If the ancestor of this thing is that, then it matches. But what happens when I run this is a big bang. Why is it throwing an exception? Well, the answer is that for some values of h1 path from patriarch, subpath 0 and -1 is not valid. It doesn't return a valid path and won't get ancestor. You say, it's fine, there isn't an ancestor, and keeps going. In Postgres, it throws. This is very data dependent. It depends what you're calling it on. It's quite tricky to make sure that you're only doing it when it's safe. You can obviously do h1 path from patriarch NLevel is greater than zero here. Then this should work fine if I've done that right. But you have to be a little careful with these things. Frequently, when I was trying things out, I ended up with it thrown in a section saying, you've gone outside the bounds, you've created something that isn't valid there. But now, it does work because now whenever passing the root essentially to this thing and it saying the root doesn't have any ancestor, that's invalid. There are other ways to do this though. That's why we have these ones here in Postgres where I'm doing it in a slightly different way. Which is better? I don't know. But this is a different way of doing it where I'm just saying, I'm using IsAncestorOf in this case, so Postgres has both IsancestorOf and IsDescendentOf, but I'm basically filtering the ancestors just to say, only give me the ones where the level is the ancestors level, level -1. That writes a different query, but it returns the same results. I have no idea if one's more efficient than the other or any of that stuff. Even in terms of what's easier to understand, I don't know. Although personally, I find this subpath things pretty hard. You have to really think about do I want it from this end, that end. This is perhaps easy for me, but it's totally up to you. Anyway, we get the same thing there. We can do exactly the same trick for finding the right descendants here, but we're doing NLevel +1 instead of -1. It essentially gives you the same thing as doing that, but it's just a little bit maybe simpler or more robust in terms of you don't have to worry about making sure you get correct path when you manipulate the path. >> Arthur, do you think we should stop and talk about the more conceptual things or do you want to go on with the examples? >> Very soon. >> Very soon? >> Yes. But I've got one more example. Obviously the burning questions for scholars of hobbit family trees is, who is the common ancestor? Actually, I'm going to change this to the more compelling question. Who is the common ancestor for Frodo and Bilbo? Why did Frodo get back in? Surely, it should have gone to Lotho, you would have thought. If you're following the normal inheritance stuff, it should have gone to Lotho, but it went to Frodo. Now, this isn't going to answer this question because we all know that the Frodo have got back end because Bilbo gave it to him in his will or whatever. Although it's interesting, he gave it to him, but then he went off to the west and he didn't potentially ever die, so should it actually have ever been Frodo's? I mean, I don't know. There's a legal question there, but it wouldn't been Lotho's either way. Anyway, enough of that. We want to find out who the common ancestor of these two are. Again, there may be different ways to write this. The way I did it is just find the descendants of one, the descendants of others and where they find all of the ones where those descendants, the Hobbit is both the descendant of Frodo and the descendant of Bilbo, ordering it so we get the closest one and then print it out. >> I just wrote it this way on SQL Server just because that's what came to mind and gave me the right answer for the question that I wanted, which is Bilbo. Postgres, I spent a lot of time and I don't even still know, I talked to Shay about this yesterday and I still don't know if what I came up with is even the right way of doing things on Earth because maybe Shay can talk a little about this in a minute. But Postgres has this longest common ancestor function that looks like it's perfect for this. Is except that it's a static function and you have to pass two things in. You need to do a query while you're doing a select many to bring both of them back. To make it similar to the SQL Server one, I had both the hobbits queried first and I was just passing them. But in this case, I'm actually re-selecting them in a sense because I couldn't figure out a way how if I'm just passing these in. Then of course this thing just brings back the LTree, the hierarchy, the LTree of the thing. You actually have to match it again in order to get back the thing that it matches. Shay, sent me this simple function yesterday and I'm like that looks awesome and then I tried to actually use it. As always, when somebody gives you an answer and they don't know your actual domain business logic, it didn't actually work for me. Anyway, so this works. We can run this. >> We can also use the SQL Server approach. You can also do the same. >> Yeah, you can absolutely do the SQL Server approach and it will be fine. But anyway, I just bringing this up now because this longest common ancestor looks so useful if you're trying to do these things, but it's perhaps not the easiest thing to use in every case. >> The source of the complexity here is that like any other function, in general, in SQL longest common ancestor needs to get two. LTree is right like two hats. It needs them on the same row. Where on a relational database, so it's going over row by row. Then when you want to invoke a function then you have to get the two values from the same row in prints. Now, of course, the thing here, the source of the complexity is here, we actually want two LTrees from different rows. We have various rows. This is why when you look at the actual SQL the SelectMany translates to a Cross Join. This is basically a trick of ours to take two rows and to flip them around horizontally so that you can take those two LTrees, which were first vertical, and make them horizontal so that we can invoke that function. Here, I think, definitely compared to what I experimented with yesterday, there's more complexity than what I tried yesterday. But the basic principle of it is the same. We're using SelectMany, so this has a self-join. This is where we're joining the table with itself. Then theoretically, all the combinations of different habits are joined together so that they're on the same row so that we can call longest common ancestor on them. Which is a nice string. It's not just an Altria triggered, it's something that's used in SQL programming in general. Just so that people are aware. >> Indeed, yes. I was going to try and find what you sent me yesterday, but I won't. But it also sometimes looks easier to do a SelectMany using the comprehension syntax rather than doing a SelectMany way you can do a join and then it looks easier. Anyway, this is the end of the common examples between SQL Server and Postgres. We can answer some general questions, and then we can also go and look at some things that you can do with Postgres conceptually a little bit different from what SQL Server supports. >> I want to say a few things. I think there's a discussion to be had. We had this discussion offline, but I'll show as an introduction to this that this question has come up at least three times, I think in the chat, I'll highlight one of them. The question is to compare hierarchy IDs or LTrees with just the plain old style child-parent relationships within based identity and foreign keys. I'll show another one basically that asks the same question. What about id<-parent_id using recursive CTEs for all that stuff? First, I want to clarify what this means, for people who are not necessarily aware. Instead of all this buzz around Altria and hierarchy, if you just have your basic relational database, the way to represent a model is very simple. Every row can have a parent ID, which is just a foreign key relationship to the same table. Basically, you have a person in that or halfling in a row. Then each halfling has a column that says, who has my ancestor basically? Using that, you can basically model, it's the same thing you're modeling a tree. You're organizing all the rows in that table into a tree by simply having a single column which has a foreign key constraint back to that same table. That's a very common way to model things around. Indeed it is an alternative way to do exactly what Arthur has been showing you, which is having a table of halflings. Now at this point, of course, this begs the question of why are we doing it this way with hierarchy ID. First, I came from the Postgres world with Altria. If you look at how people use Altria in general, you look at the samples, you see something a little bit different from what Arthur showed. In Arthur's example here, the things inside the hierarchy ID or the LTree are rows in the table itself. It's a hierarchy that basically is applied to the row of the tables. Inside the LTree for a given halfling, you have other halflings. This by the way is why you have numbers in there. Those numbers actually correspond to the primary key of those other halflings. Integrating, we're basically refer, is that not correct, Arthur? I see you shaking your head. >> They don't have to respond to the primary key, but they are an alternate key in some sense. >> They identify the other rows in the same way. >> They identify the position of that row in the hierarchy. Yes. >> Exactly. Notably what this means is if we want to suddenly do some re-parenting or changing the hierarchy, that actually would require quite a bit of potentially complex queries. We'd have to go over all the hierarchy IDs of everybody in the database in order to modify some number in the middle to some other number in the middle. Just to make sure that people understand what it means when my hierarchy ID value is actually referencing rows. Just the alternative, what I've been seeing in, none of us who are experts in hierarchy database things. But the thing I'm used to seeing is a little bit different. In Postgres like the first sample that you see is not something that refers to other rows, but just something that applies a classification or an organization to that specific role. For example, let's say you have your employees. So you have a lot of employees and you want to say which departments these employees belonged to. Departments are not employees. This is the crucial point here. A department is not another row in the same table that we were just discussing. Departments can be hierarchical. The engineering department that's part of the R&D department, that's part of this department. It could be arbitrarily deep, so it is a hierarchy. But the things inside the Altria generally don't represent at all other rows in any way. The rows are, you could think about them as the leaves, the employees, or whatever. But the thing that you shove inside your actual LTree value is not at all other employees or other things, and other rows. This is why in the LTree sample, maybe Arthur, we can show that sample with the, I can also do this. >> Yeah, I can. I'm going to show the [OVERLAPPING] >> Just to show an alternative thing. >> Yeah. >> Like from the docs. >> Okay. Let me do this. >> Here's an example. >> Great. This is the example I'm used to also working with. I wrote the tests using, like for the posttest provider, I wrote all the support for Altria and I use these tests as well. What you see here as categorization of subjects or whatever. There's top which is an artificial node that's the top of the root. Then there's science versus Hobbes versus collections, and under that there's astronomy versus amateurs versus pictures. The crucial point, sorry. >> No I just, I look to this and I'm like what is this? Is this really subjects like collections, like it's just things, interests maybe, I don't know. Anyway, keep going. >> Maybe. It's some classification that makes sense for you, for your rows and your table that basically is what L tree is supposed to be about. That's also why it's about labels. This really is just a form of labeling that also happens to support hierarchies trees. That's all that it is, which I think maybe is philosophically a little bit different from the hierarchy idea approach, maybe I don't know this. Again, we just discussed this Arthur and me a little bit in the past few days, but none of us are experts in this in anyway. But the point I'm trying to make is if you're using this organizational structure in L tree, then obviously it doesn't make any more sense to compare it to the traditional parent ID thing because the relationship is no longer between rows. You can't just show a parent ID and point to another row because the Altria doesn't contain rows anymore. It contains some arbitrary, this thing with strings basically. I guess now we can go back to the original question that people have said. If we want to compare this with the traditional thing, then that means already that we're using hierarchy ID in a way that represents a tree over rows, and where the rows themselves participate in the tree. It's the first model rather than the second model and that's a point that I wanted to drive home. I talked for a long time. Arthur does this makes sense. Do you want to add anything? >> I can talk too. >> Absolutely. >> I think that makes sense for the Postgres view of it. But I mean the, the SQL Server one is called hierarchy ID. In most of the sample databases, it truly is the primary key. If you think about it, is the difference between knowing your manager or your management chain. If you meet another employee in the hall and all you know is your manager and you want to know who's our common ancestor effectively, you'll have to go find your manager, he'll have to go find his Manager, they'll have to find their managers. But if you know your chain, you can answer that question with only two pieces of information. That's why these are more performant and more optimized because any two notes can know their relationship to each other without having to write a CTE or all these joins, just trying to figure out this information. >> I think that is critical. If you read this SQL Server docs, I have a little bit at least being able to do these kinds of queries efficiently. I can find all the descendants efficiently and do that without a recursive CT, that in its name has to recurse. You don't need to do that because you've got all of that stuff encoded already in the hierarchy ID, that's a big reason I think. >> Exactly. >> I'd love to know why they added it to SQL Server, like what the use case was? It may pure speculation on my part, I have no knowledge on this, but it makes me think when in fashion they we're going to store file systems and stuff in the database that they were like, we need an efficient way to be able to query file systems and things in the database because that's one of the big advantages of it. >> I think both of you obviously hit the nail on the head. The point is that you have all the information right there on the role basically and you have indexes that's basically all. I think that is enough, at least for me as a proof guide, that's enough of an answer for sure. It's a beautiful feature, just looking at that. I'm sure we've been throwing this term around recursive CTEs. I'm sure people here, our viewers aren't necessarily aware of it. I don't think we have an example like often to show, but there's a thing in SQL called a width clause. It's for a CTE, that's a common table expression, that's what CTE means. Basically, it's the only way where you can introduce recursiveness into SQL in general, SQL does not support recursiveness except via this tool which is called recursive CTEs. When you use the parent ID, the classical way to model a hierarchy then if you want to do something like get me all the descendants of somebody, then you can't just write a Select to do it because you only get, if you ask, get me that the descedants of X then you'll only get the direct descedants. If you want to go and get those descendants as well, and then those descendants as well, then that is a recursive query. This is a thing that you can do with this tool called recursive CTEs. However, that every time is like, I'm not yet another big query that you're doing in a recursive way. Efficiency-wise, this is not ideal recursive CTEs. >> Is like doing lots of joins together. >> You can look at it that way. Exactly. I mean, there might be tricks and various ways to optimize and in a database, but it's not going to be like looking at a specific row. Basically or just looking at all the rows who's hierarchy ID begins with something which is a pretty simple thing if you have an index and SQL Server supports. I looked into this a little bit, it supports indexes over this which allow you exactly this query. Give me all the rows where the hierarchy ID happens to start with one slash two slash four or whatever. On the post websites, you have the same thing, so you have efficient indexes as well on the Altria. But things don't necessarily as Brice said. I mean, things don't correspond to rows, it's like a different thing. It's an arbitrary classification thing that's detached from your rows. Basically it's just a bunch of labels. By the way, sorry, I'll just say one last thing and if I'm not mistaken, and Brice will correct me on this, one of the major differences here people have been asking, can we haven't abstraction over these two things? Like one big difference for example in postgres, the labels are just labels. They can be textual as you saw in the example. Whereas, if I understand correctly, that's not the case in the SQL Server. >> Like spatial, there's a standard, there's a spatial standard and all the databases implement, there is no hierarchy ID standard. >> Exactly. >> They invent their own solution to these problems. >> Exactly. >> I mean, think from an abstraction. Ties with what I'm going to say, which is that postgres, supports the more restricted view that SQL Server has over these things. But postgres, being the overachiever in the database world, can go and do other stuff that is conceptually somewhat different from what the SQL Server thing does. In terms of abstractions, I think you could imagine an abstraction over the basic concept of them as IDs and that could be implemented in both but it would be limited to that. Then I have no idea what MySQL or any other databases support in this. The one thing that Diego brought up years ago was, it would be really cool if we could actually map these things and navigation properties. Like I've got a half ling and then its ancestor is a navigation property, and then it has a collection of its descendants. In my object model, I have something that looks very nicely like a C# object model but then that translates to hierarchy IDs or L trees in the databases, and I think that would be really cool. I think it's non-trivial to do in some respects, but I haven't tried to do it either. But I think that would be a very interesting specific abstraction of this that would fit to both of them and allow you to model these things into perhaps a more natural way. >> That I think actually goes back to this question of the parent ID that people were asking. In effect, the hierarchy ID rather value that you have there is like a foreign key, to your parent and also all of their parents. All the way up the chain. >> It's a key that contains everything. All of it's like primary. >> Like a variable length list of foreign keys, you can think. >> Exactly. >> You could think about like something like that. >> We're going to look at some cool Postgres examples that think about it in this other way in a minute. Before we do that though, I just want to go back to something that Shay said which is about manipulating these things. One of the reasons why I didn't make it, and I was going to try and do this on Postgres but postgres as far as I can tell, you could write this on Postgres, you could do this because it is client-side stuff, but I didn't have time to do it. I haven't got the Postgres example of this, but this is on the dotnet bloggers, which I copied and pasted from what's new in EF Core 8 so either place, how's this? But the idea here is I actually do want to take a subtree. If we go back, if I can find my thing. What did we say? Let me read the text on here first. Shay reckoning 1752 LongoGate we revealed that Longo is not in fact the son of Mungo, but actually the son of Ponto. I know. Longo here, not the son of was that, I've forgotten now. Longo not the son of Mungo actually the son of Ponto. We want to move Longo. This tree, all of Longo's things and put them onto Ponto here. This is a place where ideally we don't want the primary keys of Longo, Ortho and Lotho to change because they're actually the same entities, but their place in this tree structure has changed. We use the "Word" re-parenting in databases or object models a lot, but it's literally, it's like I'm re-parenting Longo from Mungo onto Ponto. SQL server has this fairly useful function for doing that called get reparented value, where you basically say, given this descendant, go from Mungo to Ponto instead. You can use, you can load the descendants into EF Core run list, this is a client side thing. I mean, it probably translates to server you could do with their execute update or whatever as well. But in this case I'm doing on the client side and then save it, and then it's going to then send back. You can see actually here it's interesting to see the parameters being sent in the compact binary format. >> Nice. >> Then after that, now if we do Mungo, we get these things. Well, Ponto now has Longo in the Ponto. There is actually a case where you can separate this thing out from the primary key and say actually this is not necessarily the primary key of the thing. It represents the key within the tree, but that could be conceptually different and we might want to manipulate that to move parts of the tree around. File system for example copy this folder to here. Now maybe you would want to do that by deleting it here and recreating it here, I don't know. But either way, it's conceptually you're moving things around in the tree without changing their primary identity as it were. Anyway so that's just an interesting thing there. Because Postgres has this representation of it and we're not thinking of it in strictly in terms of keys in a tree in a sense key, like on the SQL server model, we can do some interesting things with it. I'm not going to show very many of them primarily because I run out of time preparing the demo. But I am going to show some pattern-matching and free text search that you can do within these things, which is cool. I have in here, I've called these things interests because like I was saying to Shay, I don't know what these things really are. They don't map very well to an entity type in this conceptual model like we were talking about. I put L tree as the primary key in the interest. All we really care about is the labels in the tree. So I put comments on here like I still some comments about each of these things. But again, it's like, does this really map to an entity type in the same way that it does for the other case? I'm not sure. Then for the seeder, I have this case where we're doing the same thing, but we're just using these labels from the Postgres example. Let me go to the Postgres example here. I'm going to change this to interests, contexts. Then comment down this stuff. I've got a pattern matching example here. But what we can do here is we can actually say that this matches lQuery method, which is going to translate into one of those funky operators. It's going to my *.astronomy.* so basically return everything in here that has astronomy. I'm going to run this, but let's go back and look here. This is interesting. You'll notice because astronomy actually appears at two different levels in the tree. Rather than just being a number, we're actually saying the semantic meaning to the fact that the Word astronomy appears here and appears here they are related in a way that isn't true if this was just one two three and this was one two three four. Didn't you wouldn't just because this is one on this these one doesn't mean they're related. But the fact that this is astronomy and this is astronomy, and we're saying there's some semantic meaning to that. We can now run this query which uses what's the operator it uses? The parameters is this one. It's just the tilde operator. There you go. There we go. We can again by everything with astronomy, no matter what level it's at. It's basically doing pattern matching and because we could do so and see patterns, I don't even know what this pattern means. Maybe one of Brice or Shay knows what *.bangpictures@astronomy means. >> I don't know. It's a whole query language in there, which I don't know. >> It's matching. Let's assume that that's the correct match for that pattern. For which Shay or Brice guesses about appropriate more likely. Very similar to that, we can do a full text search on these things. In this case, here for doing the patent matching we're doing MatchesLQuery. In this case, we're doing MatchesLTxtsQuery. We can run this again, and it will use a slightly different operator. So it's using the app operator here. I don't know anyway because I did this like five minutes before the show. It's presumably doing a free text search on the ltrees text. Basically, there's a bunch of other things here so two pages. We stopped doing show links sample. We need to do this yet. If we had show links, these would be going in the shown link, to documentation pages. There's this documentation page for the postgres support, which talks about all of these different things. Goes through conceptually, the difference between the text query and the L query for the free text and whatnot and then talks about all of the different operators here that you have. Then the function so we mentioned subtree and subpath and n level. You can also do an index and then some other stuff here. Then it talks about creating indexes on here, which is useful. Then it has some of these examples where I just copied and pasted and put them into EF. Then on the Npgsql EF core provide a page, there's the functions of ltrees and what they translate. So you can just say I've read that and I know I need the question mark at greater than operator. Then I'm going to pass a list of ltrees and do FirstOrDefault and do IsAncestorOf. That's what it will translate to. Postgres has a lot of stuff there. But in general, I think, We've covered why you probably would, might want to use these things, how you could do it in SQL Server and Postgres and at least hopefully given people an idea. Any questions that we should cover before we end Shay? >> There's one that I want to speak too quickly. What is the benefit of using this approach versus JSONB? So basically starting a JSON document. First of all, what we've been talking about here, both LTP and hierarchy IDs are a path within a tree. They are a specific path, whereas a JSON document represents an entire tree, that means that every row has a whole tree there. JSON is just a different thing. If you're trying to model a hierarchy where each row is part of that hierarchy, I don't think JSONB makes a lot of sense. Of course, if you don't, for example, if I were in some other database like MySQL, which doesn't have one of these. Then one way to do this would be to store an array of all the IDs of my parents or all my categories or department names. Since database is generally don't support arrays, you could use JSON to hold an array on each one of you rows, but that's not like a full-fledged JSON document. It's just a way of holding an array. It's important to understand that conceptually these are arrays of IDs or department names or whatever it is that you want. They're not a tree. They make up a tree. If you take all of you rows together and you put all of them together, then you have this big tree. But each row on its own does not contain a tree. It just contains a path within that tree. That was my answer around it. The last question for you, Arthur, is will, this sample code be available? >> Yes, I will put it up on my GitHub account, ajcvickers. I'll push it up. If not tonight, then tomorrow. I will definitely do that. Also you can also look in the What's New in EF Core, AKAMS/EF8H-new. That has a bunch of the SQL Server stuff there with runnable samples that you can get right now, where we call hobbits offerings, but it's the same stuff. But, I'll put the Postgres stuff up as well. >> It's nice. The samples are great because they're EF code using these two APIs. But in general, you can just look at the docs, understand what it is you want to do and then just, call the methods in our docs which translate to those operations. It shouldn't be too hard to piece together. But of course, samples are always even better. >> I see somebody asking about MySQL there. I don't know if it's been asked before. I have no idea whether there is support for this in MySQL or not, likewise for SQLite. Brice, I don't know, SQL has anything like this, but both the MySQL provider and the SQLite provider are open-source. Even if you don't add it to the variety, you could probably create a package in the same way the community did for that. Feel free to go in and look at it or whatever. It's not something that we on the team are planning to do. But we'd be happy to help out with the community advice, make sure it works the same way and all that stuff. >> Cool. >> I think that's it? >> We're not like super overtime. Only like 17 minutes after the hour. >> Only 17 minutes. Well, it's the record first. Thanks, everybody. Appreciate you watching. We'll figure out something to do in two weeks time, probably the day before. Bye. Thanks all. >> Bye. >> Bye. >> [MUSIC]
Info
Channel: dotnet
Views: 8,077
Rating: undefined out of 5
Keywords: entityframework, csharp, data, dotnet
Id: pmnHGWYpCfg
Channel Id: undefined
Length: 78min 48sec (4728 seconds)
Published: Wed Mar 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.