[MUSIC] Count down beginning. Two minus two minutes. [MUSIC] >> Jeremy Likness, Shay Rojansky, Brice Lambson, Arthur Vickers. >> SQL Server, PostGreSQL, SQLite, Oracle, MySQL, Cosmos DB. >> Net Data, T-1 minute. C#, F#, VB,.Net Data, Topology Suite, 0Data, GraphQL, EF Core Power
tools, Node Time. >> 30 seconds. >> Open source software, performance, System Data, Dapper, ASP.NET Core, Entity Framework Core. >> Ten, nine, eight, seven, six, five, four, three, two, one. Lift off. We are going
for.NET community stand-up. >> Here we are again,.NET
data community stand-up. This week we're going to talk
about hierarchical data. I'm joined by Shay from the.NET
DataAccess team as always, and then also Brice who has worked with the community
on this over the years. He's going to talk in a little bit about the community history of this. As is always the case it's been a little bit of a scramble
of getting everything ready. If there's some live demo issues, then that's the way it goes, but we'll see what happens. Let's start. As always, with the
state of the unicorn. Seems like forever ago that
we released these things now, but it's actually
less than two weeks. EF core 7.0.4 is out there. Seven more bug fixes. We got a lot bug fixes in 7.0.3, but they're still important stuff
still coming in so get 7.0.4. Actually one of those bug fixes
it's a Microsoft data SQL client not in EF and we always
forget about it. We'd never mentioned Microsoft data, SQL client and its Brice's thing
that he created and everything. Sorry, Brice that we always
just gloss over that, but there's a bug fixing
Microsoft ADC client. I guess it's good that it's
hardly ever any bug fixes in it, but there is one in this case. Then also the EF Core 8
Preview 2 is on NuGet, which has SQL Server
hierarchy ideas, official support, which we're
going to talk about today. Then also JSON columns in SQLite. This is the same support we have
for JSON columns in SQL Server, now works on SQLite as well. We're still working on lots of
improvements in the SQLite space. Marie, she's working hard on that, but what you could do in SQL Server, you can now do in SQLite plus
because there's already been some enhancements to JSON
columns support in Preview 1, and those also work on SQLite. As always, aka.ms/ef8-new, that's the watch new in EF8 and there's runnable
samples on GitHub. I'll try and upload the code that
I'm doing here to GitHub later. I haven't done it yet because
it's been a scramble. But also the code from the
stand up I'll put there. Let's talk a little bit
about hierarchical data. What we mean by that. We'll get into this a little bit more later when we
look at differences between Postgres and
SQL Server as well. But for now we're just going to do the common denominator type thing. We've got a very famous
family tree here of hobbits. I've been trying to avoid seeing trademark terms in writing them
down in any of the online stuff. We refer to as hotlinks and
anything that's written down. >> I don't think hobbits are
a trade trademark thing. >> Tolkien Estate. Yes. >> Are you saying he
invented hobbits? I'm very surprised. >> Yes. >> Isn't this an English
word from the whatever? >> Oh, well, we could go into
the likely history of it. >> We absolutely should. >> Anyway, if they had to cease and desist and tell us to
remove this YouTube video, then we will do that. But I figured I can be
a little bit more free on the video that I
have been on stuff we publish on the.NET
network or whatever. Anyway, what do we mean
by hierarchical data? Essentially tree structures. I mean, that's basically what it means. You've got a root
and then you've got nodes down in the
tree going to leave. That's essentially what we
mean by hierarchical data. SQL Server has a particular syntax
for representing these things. Postgres syntax is more flexible. But what I'm going to do is
just to make this really clear, I'm going to go through this
tree structure and show the SQL Server syntax so that when
we get to looking at the code, you can mentally move back between the tree image and what you're seeing in terms of
what's in the code. A SQL Server, the
root is just a slash. That's the patriarch of the family. Of course, this Tolkien's day, this is a patriarchal family tree. That's just the way it is. Sorry, but that's
what's in the book. That's what we're using. Anyway.
Now we get to the first level. SQL Server is the first level. Each level is basically
prepends the level before it. The root is /, and then the first level is /1/, /2/, /3/, /4/, /5/. These are all with children
on the first level. Now when we go down to
the children of Mungo, again, all the children Mungo are
prefix by Mungo's hierarchy ID. This is what this is.
This is the hierarchy ID. The first child is
Mungo's prefix /1/, and then 2, 3, 4, 5. Likewise, if we go to Ponto here, since Ponto was /3/, then Rosa is /3/1/ and Polo is
/3/2/ and so on for other ones. Then likewise, when we go down to the next level, it's
just the same thing. The prefix, and then you add
another suffix on the end of it. Bilbo /1/1/1/, so you can tell why Bilbo inherited back-end because Bilbo is the first-line
male descendant in the Baggins' family tree there. Otho, somewhat infamous and Falco, /1/5/1/, and then Posco and Prisca, I assume that you say that. Then of course now we're
going down this other side of the family tree where
everything is from Largo, so these are all prefixed with
/4/ and we get down to here, we've got /4/1/2/ and so on. Then final Lotho, very, very infamous
Sackville-Baggins there. Poppy, Ponto, Porto, and Peony, and then Frodo, as I think most people have heard
of Frodo, so /4/1/2/1/. I think we've brought two more here. Then Angelica is the only
fifth-level one we have right now. This is basically what we mean by
a tree structure, a hierarchy. Now, I can say Postgres, there's a bit more
flexibility in this, but let's stick with the SQL server
convention just to start with. >> I'm going to switch
to show some code. I've got a lot of code here. We're going to look at, I've got both SQL Server and
Postgres examples in here. We're going to look
at both how you do this in Postgres and SQL Server. Before we go into that, let's talk a little bit
about the packages involved, the history of how this came about, and this is for SQL
Server specifically. The hierarchy ID is actually been
supported for quite some time. Let me find my window here. It's been supported quite some
time by this community project. Do you want to talk
about how that came about Brice and the history of this? >> Yeah, sure. Way
back in EF Core 22, which was released in November
2019 or somewhere around there. Is that right? That might be right. Anyway, whenever that was, I was doing a bunch of the
spatial work which was bringing in the SQL geometry and
the SQL geography types. While I was doing
that work, I'm like, we can easily do this the
exact same for hierarchy ID, but the problem was there wasn't a hierarchy ID library out there, a special we chose a topologysly. But for the hierarchy ID I found a community packaged by Martin Nelson who actually brought in a bunch of
the spatial stuff. It was unsupported though, we were hesitant to just say, hey yeah let's merge
unsupported code into the main Entity
Framework Core branch. I spiked down some code, I wrote down some code,
and I put it out there. I don't even know if it
was working completely, then I of course got distracted
with other features. But a community member named Adam Jones wanted
hierarchy ID support. He found it and he brought
it up to speed and he got it working and end with test
and so we released it. I think we even released a compatible with Entity
Framework Core 22. Then we've kept stringing it along, there's been lots of
contributions to it, to evolve it, to add
this functionality. But it wasn't until last year, earlier this year that the
SQL Server team finally released official SQL
hierarchy ID support. What we did is we reached out to all the community members who
had contributed and said, hey, would you guys want to
contribute all of this into the main Entity Framework
Core projects so it can become official and support it, now that we can have
end-to-end support, they all happily agreed. Obviously, they wanted hierarchy
ID support [inaudible], that's why they contributed. I just recently in time
for EDO preview too. Actually merged all the code in, that's how it's finally
an official package, but the package has been
around for a while, it's been pretty heavily
tested by the community it has some effects but the
functionality is pretty well. >> Thanks to everybody who's in the community who's
contributed over this, you made this happen, we really appreciate it. Also, if you're watching this and
you're like I'm not on EF8 yet, remember there is a
community package here, you can do this. I mean, I don't even know
how far it goes back, at least three WAN which isn't
even supported anywhere. Basically whatever the version of support EF Core you're
using because obviously, you're not using an
unsupported version. Every version of EF Core that you're using has some form of
hierarchy ID support, which is basically, this
may be a few differences, but basically the same as what
we're going to show here, basically the same code. Let's talk also a little bit
about packages for hierarchy ID. There's three levels of
packages and assemblies. This is usually not that important. Usually what you want to do is
just include this in your project, the Microsoft Entity Framework Core SQL Server hierarchy
ID project package. That brings in
everything you need to write queries to integrate with
EF Core and all of that stuff. But actually, this package itself is dependent on this package
called SQL Server abstraction, which is another new package. This has the actual
hierarchy ID type that you can put in your entities in it that you'll see in a minute. Like all of our
abstractions package, this is a very light dependencies. See it doesn't depend critically on any other
Entity Framework packages. If you are trying to keep
your data access code if you like out of your model but you want to use
hierarchy IDs in your model, then you bring this in and you
get very minimal dependencies, very minimal impact
on your deployment. Then as Brice mentioned at the lowest level this is the
Microsoft SQL server types package. This is the one that their SQL Server people
have released to actually support various custom EDT
essentially that we're in SQL Server. This is the special
types of geometry and geography and the hierarchy ID type. Are there any of these Brice? >> No [inaudible]. >> This is at the
lowest level thing. These things are SQL
Server dependent, as we'll see in a minute, the Postgres stuff is
independent of this. It would've been nice if we had
some common thing between them, but there's quite a
lot of differences. As Brice says it wasn't
a community library, we've ended up here, it's not. A lot of the functionality
is very similar, but the actual types
that use are different. >> [inaudible] putting out to
that the hierarchy ID type is backed by the SQL
Server types library. That'll work outside of SQL Server. If you're just in a
pure dot net app, you can use this and all the
functionality will work. >> Yeah, it's a little bit different from geometry and
geography in that all of the methods have quite implementations
Brice in the hierarchy ID, so that even though like
in SQL Server right now if you use one of the geography from
this and then you do something, it's not implemented everywhere. On the client side it only
worsens a translation, whereas the hierarchical
ID is more simpler. >> It actually works on Windows, is because there's a native DLL, it's highly optimized code, they want these
operations to be fast. >> But hierarchy ID
is not like that, it will work everywhere. >> Pure managed, it's all running. >> Let's go back to the code then. Should we stop for any questions? Sure, I haven't been
looking at the chat. >> I don't think. I think let's
go see some hierarchy ID, and then there's
already some questions but better deal with them
after we see some code. >> Okay. In order to use, in order to get hierarchy ID support in with SQL Server,
you need to do this. Use hierarchy ID thing
that tells us stop, that tells us to map the hierarchy ID type and
everything like that. If you're doing Postgres, you
don't need to do that because Postgres has the Altria type, which is equivalent to a
hierarchy ID built into the ada.net provider it
brings it in any way, and so it's all there by default. I'm using User Secrets
so that sure I can't see my Postgres
password there. Anyway that's neither
here nor there. >> Nice. >> Let's look at the
little application I have here to test things out, make a clean database, and then seed it with
some hierarchy ID things. If we go back and
remember this here, so we had all of these, we built up these
hierarchy ID strings, so this is what I'm now adding
to the SQL Server database. I have a class called Hobbit
that has a name, year of birth, and then in here I've got
the hierarchy ID which is the path down from the patriarch to this particular item in the tree. I could make this the
primary key that's valid. You can use hierarchy
IDs as the primary key. You can't mutate it. EF doesn't allow you to
mutate if the primary key, and so if you want to repair and pop the tree to another
part, then it's nice. Not having its a
primary key but anyway, that's an option that
is always there. Now, actually SQL Server has this compact performance
binary representation of hierarchy IDs under the hood. What we're doing here,
what I was saying we Parse from this human
readable string format, and to create that representation. I think in API review, we said we're going to make a constructor for this rather than
necessarily having to go Parse. Really, you don't need
to worry generally about the fact that brightening
representation is there, just think about it in terms of
the strings and everything's fine. But under the covers it's not actually storing a
string in SQL Server, it's storing a compact binary
representation of that thing. Does Postgres have that
too, Shay do you know? >> I have no idea. It's a purely
internal implementation detail inside elsewhere, so who knows. >> Anyway, so yeah, we're doing a normal thing
here where I've added all these things and I'm
calling Save Changes. Let's look at the
Postgres version of this. Now the Postgres version
of this doesn't have the same strict syntax
that SQL Server does. In fact, all it requires is that you have a dot period character
separating each level. You can put alphanumeric
characters in there, you can do various other things and we'll come back
to that bit later. I have another example
down here that uses and shows a bit more of that later. But for now, just to keep things very common between
what we're doing on SQL Server and
Postgres so we can see the same operations
all I've done is, take the same SQL Server representation
and put periods in here. This can cause some issues. If you're not very
careful with Postgres, because you don't have a lot
of unique list like this. If you're looking at
sub-tree, for example, the 1.1 exists lots of different places in here
and it has different meanings, but we'll get back to that later. But just be aware that you might not want to make it this
simple in Postgres, depending on your case we can
come back and look at that. We've stored data, let's do some queries. I have got all the
queries here cause I was rather than copying and
pasting them in from elsewhere, I was getting very
confused about which are my Postgres ones and which
are my SQL Server ones, so I'm just going to uncomment them as we go
so I don't get confused. Find a generations. Well, there's
a Typhon, find a generation. But one of the first things that you can do whether
or not this is common, Shay, you might say that
this is a bit artificial. I think it depends on how you're
conceptualizing the family tree, but you can say, give me
everything at certain levels. On here, if we say
this is level zero, this is level one and level two, so we're saying give all of these at this level no matter what
their ancestors are, if they're at this level in
the tree, give me them all. In order to do that, the hierarchy ID to
by-parse this GetLevel function and you can
match that to a level. If we run this, you'll see, I'm running the postscript
and I knew that was going to happen. How do I? Maybe I'Il just do it in here, on SQL Server. There we go. We'll store all the data. Actually, before we look at
the output of the query, let's just take a quick look
at the tables just so we can see the hierarchy ID in the table. Here you can see that
we've created a table and we're using this SQL
server type hierarchy ID. This is a built-in UDT if that's not a contradiction
in terms in SQL Server. When we run this query, and you see we get this
translated and it's translated to the Where clause whereas using the GetLevel on
the UDT and SQL Server, and as you can see, we're returning all of
those hobbits that are at that generation two. In Postgres, let me open
the oldest whitespace. It's exactly the same thing. The difference is that in Postgres, its property called NLevel instead
of a method called GetLevel. Also, I think I might
not have mentioned this, in Postgres the actual type
is not called hierarchy ID, it's called LTree, but it means basically
the same thing. >> LTree stands for a label tree, like a tree as labels. We'll talk about that
a bit later, I think. >> Yeah. Running this, I want to show the table, it's the same as it was in SQL
Server just with LTree is the type. But you can see we get
essentially the same thing except we've translated to
this NLevel Postgres function, and then we get the same
results out of here. Pretty simple, but
translate it to the server, presumably doing
efficiently on the server, that's what the idea of
storing these things in this way is that you do it
efficiently on the server. We'll stay on Postgres
since we're here right now. The example I've done for this
hidden what's new docks I realized afterwards was much more
complicated than it needs to be. We want to find all generations. I'm doing a GroupBy here, so just group by the level. If we run that, we should see that
we get a group for each level. There we go, so we've got each
level in the tree which is exactly this level down to Angelica
by herself at the end there. Just a little digression here. This is a GroupBy as
a final operator. You might notice there's
no grouping and SQL, Shay. >> This is something, this query, if you tried this before EF Core 7, this would result in an error. Now, we would not translate this. The reason is previously
we would only do GroupBy when we could translate
it to a SQL Server GroupBy. In SQL you can't just do a GroupBy. In SQL you have to also
use an aggregate function to reduce the groups
back into tabular form. Relational databases cannot have
groups of things like that, and then you only deal with
basically with table form of stuff. But what we did for
EF Core 7 to make this very popular query
work is basically, as you can see, we translate it to something that doesn't do
a GroupBy on this server. We get back all the rows
exactly as they are, but we group them for
you on the client side. Notice that there is
still an order by there. We order by the key instead
of grouping by key, and then we get basically
all the rows for a certain key value and then all
the rows for another key value, and client side inside EF Core, we pack them for you into groupings and give
you access to that thing. The key point about GroupBy
is when you think about it, just putting a GroupBy doesn't reduce the amount
of data in any way. It just packs the
shapes differently. Instead of giving a lot of rows, you get the exact same rows, but grouped in some way. The same amount of data is basically
coming back from the server, which is why we think it's okay to allow you to do
GroupBy like this, so there's nothing here in terms
of performance that would be bad. You're just reshaping your data, which is why this is fine. But it does not do
GroupBy in this N SQL, which is a little bit weird. >> But it's not inefficient and it's not bad client evaluation or
anything like that, it's all fine. >> I like to think of it as to
dictionary or to illustrate. Those don't really
translate to the database, but on the client we can
shove it into that shape. >> Yes, exactly. >> That's very true. >> Yeah. >> Let's go back to SQL server and we can do the same thing on
SQL server just for completeness. Find all generations, again, it looks exactly the same. Tap that, we're doing a
GetLevel boat this GroupBy in the same way and it will translate or I'll probably run SQL server. How many snaps? That's good. We're doing an older by
here in the same way. Translation is the same, Postgres and SQL server
in that respect. Let's do something
maybe more interesting. Or should we stop for
questions at all, Shay? >> No, let's go on. Let's keep going and just show code. There's very legitimate questions of why would I want to use this
as opposed to for example, like foreign key constraints
within the table like so, and all that stuff
is very important, but I think now we're just
looking at what this does. >> Let's keep going
on. I want to find all the descendants, given Mungo. Let's find Mungo, where
is Mungo? mungo there. Mungo has got big loaded descended. I want to find all of these
descendants of Mungo. Well, SQL server has a function on a hierarchy
code is descendant off. We can use that, it will tell us, the hierarchy that
you're looking at is a descendant of the
one that you pass it. You know what I mean? Anyway, I mean backwards when I
was talking about it. Interestingly, and I don't know
if anyone check and confirm this, maybe this is a graph theory
thing or something, I don't know. But Postgres and SQL
server recognize yourself the thing
that you're testing as being both a descendant
and an ancestor of itself. If I asked for descendant of Mungo, Mungo is considered a
descendant of Mungo, which to me was a bit weird. Anyway, I didn't want
my result aside, you started this other
course which is like, he got to return me and don't
return it. I don't want that one. Let's just run this and we can
see the query. Here we go. Descendants of Mungo
and here they are. You can do orderings and
stuff to get them in the appropriate levels
and things like that. But I haven't done that right now. In some of the examples
in the What's New, I've done some
ordering things there. But basically it's just calling
this function IsDescendantOf. If we go to Postgres, we have essentially
the same thing here. In fact the method is
even called the same, IsDescendantOf, so that
looks basically the same. If I run it, the SQL
looks a bit different, when we're doing Postgres and
SQL server, because SQL server, it's implemented in the
database as a UDP type, because the methods look
the same whereas Postgres, there's more of a
translation from methods in the.NET world to
function in Postgres. We see that here, where in Postgres they actually
have this funky operator here. Where you're saying, this is basically is to send it off, that's what this operator means. >> Postgres loves operators. Everybody who uses Postgres with the more advanced features
know that there's a ton, there's like a zoo of
operators with all kinds of characters and
it's all great fun. But the story behind this
is that there used to be a limitation in Postgres. Where if you wanted something
to be sped up by an index, what we sometimes called searchable, then it had to be an operator. You couldn't have a function, though it was just engineered that way and I heard that went
away that limitation. But that explains every
time they wanted to have an operation that could run quickly
like a contains of some sort, then they couldn't
introduce a function because they wouldn't
be able to speed it up, so they invented some operator
and it's really funny. You even have operators
with three characters. Because the characters
aren't enough, so it gets really wild, anyway. >> The obvious thing to do next, we've looked at all the descendants, let's find all ancestors. Longo here. Where is Longo? Longo is there. So all ancestors
would be Mungo and Balbo. Postgres, just like it
has its descendant of, has its ancestor of, again filter out because apparently Longo is an ancestor of himself. Anyway, this will similarly translate to
an operator on Postgres, but a slightly different one, only slightly different, that was put in the other way. But it all makes sense obviously. Going back to SQL server. SQL server house is descendant of, those in half is ancestor of. But that's not really a problem because if you want to find the
ancestors as opposed descendants, you just switch the operands around. Up here we were saying, where the H1 path
is a descendant of, and down here in order to find
ancestors we're saying where the path of the thing
that we're looking for is a descendant of the thing we're passing in, does
that makes sense? All I've done is switch
the order of the operands, and now I've got the
equivalent of his ancestor. Of course this would work
on Postgres as well, but Postgres has the operator and
therefore the method to do it. It's perhaps a little
clearer on Postgres, but it's not a difference
in functionality at all. If I run this one, then you'll see that we'll get a query translated as
appropriately for SQL server. There we go. Again, we've got his
descendant of in there. I think we're going
to start getting a little bit different between
Postgres and SQL server next. We've been looking at
all the descendants. If we were looking at Mungo, the direct descendant of Mungo is, Mungo only has one descendant, so perhaps not the best one. Let's change this to Longo, that will be better. Now I'll find now my query is wrong. No hang on. That's
ancestors, never mind. >> I wanted to say something >> We're looking for direct
descendants of Mungo. What we want is these
five, Bungo through Bingo, but we don't want Bilbo, Otho or Falco, so there we go. SQL server has this nice
function, GetAncestor, which basically says given this, give me the first ancestor. If it's two, give me
the third level up. We can use this for both finding a direct ancestor by
switching around, like we said before, or
finding direct descendants. So I can write a query
here in SQL server that will quite easily
find me only things that are they direct
descendants of Mungo. We'll run that just to
prove we're not lying. Yes, we don't have
to run everything, but it's nice to show the code actually worked and
look at the query. There you go, so we're doing
GetAncestor there and we can see we've got Bungo
through Bingo on there. Actually, let's find direct ancestor
in SQL server first as well, just so we can get out
of the way because it's basically the same thing but with, again, the GetAncestor goal
on the other side of it. In this case we want to get ancestors so we put the
GetAncestor code on that, operate that side of the
equation rather than that quite. In this case, I guess I
already change it to Bilbo, we can get all of the
ancestors of Bilbo, we should go all the
way up to Balbo there. Oh, no, direct ancestor, I don't know what I'm talking about. The director ancestor of
Bilbo is as we know, Bungo. Was it Bungo guy who built back end? I can't remember. You don't remember?
I think it might have been. >> These ancestors would
be very ashamed of your understanding of
their family tree. >> I think would be, indeed. Let's go and look at
this in Postgres. I'm actually going to
skip down a bit first, because I've got a
way to do it here and then we've got actually a
different way to do it. The different way to do
it looks very similar to the way we were
doing it in SQL server. Postgres has a lot of these methods, let me actually bring
that up real quickly. On the Postgres documentation here, also looking the translation. Postgres translations, these are the translated functions
for the L3 mapping. >> I hope everybody can see those
funky operators on the right, that's what I was
talking about before. Look at that three character one, question mark at sign greater than, isn't that great?
Isn't that beautiful? >> Anyway, there's two different
sub-path methods here, that basically go into your, if you think of a string
representation and they'll strip off this bit all
or strip off that bit all, you'll find the bit
in the middle of it. You can do manipulations of the position in the tree by
manipulating the path in that way. That's what I've done here. Instead of doing GetAncestor, I've done sub-path and then
take the thing off the end. That gives me my ancestor. Then this looks
basically the same as the SQL server query
did with GetAncestor. Let's run that. We should see something with some
funky operator again. >> Oh no, it's not Funky
Operator. I made debugging. >> Yeah. You chose debugging
instead of output. >> Output. There we go. It's not Funky Operator. I don't know what I'm
doing now. Anyway, am I actually running in the
debugger? Is that what's up? >> No, you're okay. >> Let me run again. I need
to get back into where I was. We don't actually have
an operator here. It's actually a function subpath, so it looks a bit more simple. But you can see we're passing
in 0 and -1. This works. Bungo. My experience of
these things though, the subpaths of trees, things that you have to be
very careful for two reasons. One, because it's easy to get something that matches but is
actually matching the wrong thing, especially if your hierarchy
needs are in the 1.2.3. Remember I said you could do a subpath and it gets
1.1 and it finds 1.1, but it was actually because
you've got the 1.1 of the end instead of the
1.1 of the beginning. When you run it with another value,
gives you the wrong results. That I found quite
hard to get it right. The other issue is they're
not very forgiving. Our GetAncestor is quite forgiving. Postgres. For example, if I wanted to use this
subpath thing, as a way, you could do the same thing that
I did on SQL Server thinking, it's just the same as I used there, so this subpath is GetAncestor. This looks fine to me. I mean, it's doing the same thing. It's saying, If the
ancestor of this thing is that, then it matches. But what happens when I
run this is a big bang. Why is it throwing an exception? Well, the answer is that for some values of h1
path from patriarch, subpath 0 and -1 is not valid. It doesn't return a valid
path and won't get ancestor. You say, it's fine, there isn't
an ancestor, and keeps going. In Postgres, it throws. This is very data dependent. It depends what
you're calling it on. It's quite tricky to make sure that you're only doing
it when it's safe. You can obviously do h1 path from patriarch NLevel is
greater than zero here. Then this should work fine
if I've done that right. But you have to be a little
careful with these things. Frequently, when I was
trying things out, I ended up with it thrown
in a section saying, you've gone outside the bounds, you've created something
that isn't valid there. But now, it does work
because now whenever passing the root essentially
to this thing and it saying the root doesn't have
any ancestor, that's invalid. There are other ways
to do this though. That's why we have these ones
here in Postgres where I'm doing it in a slightly
different way. Which is better? I don't know. But this
is a different way of doing it where I'm just saying, I'm using IsAncestorOf in this case, so Postgres has both
IsancestorOf and IsDescendentOf, but I'm basically filtering
the ancestors just to say, only give me the
ones where the level is the ancestors level, level -1. That writes a different query, but it returns the same results. I have no idea if one's
more efficient than the other or any of that stuff. Even in terms of what's easier
to understand, I don't know. Although personally, I find this
subpath things pretty hard. You have to really think about do I want it from
this end, that end. This is perhaps easy for me, but it's totally up to you. Anyway, we get the same thing there. We can do exactly the same trick for finding the right
descendants here, but we're doing NLevel
+1 instead of -1. It essentially gives you the
same thing as doing that, but it's just a little bit
maybe simpler or more robust in terms of you don't have to
worry about making sure you get correct path when you
manipulate the path. >> Arthur, do you think we
should stop and talk about the more conceptual things or do you want to go
on with the examples? >> Very soon. >> Very soon? >> Yes. But I've got
one more example. Obviously the burning questions for scholars of hobbit
family trees is, who is the common ancestor? Actually, I'm going
to change this to the more compelling question. Who is the common ancestor
for Frodo and Bilbo? Why did Frodo get back in? Surely, it should have gone to
Lotho, you would have thought. If you're following the
normal inheritance stuff, it should have gone to Lotho, but it went to Frodo. Now, this isn't going to answer this question because
we all know that the Frodo have got back end because Bilbo gave it to
him in his will or whatever. Although it's interesting,
he gave it to him, but then he went off to the west and he didn't potentially ever die, so should it actually have ever been Frodo's? I mean, I don't know. There's a legal question there, but it wouldn't been
Lotho's either way. Anyway, enough of that. We want to find out who the
common ancestor of these two are. Again, there may be different
ways to write this. The way I did it is just
find the descendants of one, the descendants of others and where they find all of the ones
where those descendants, the Hobbit is both the descendant of Frodo and the
descendant of Bilbo, ordering it so we get the closest
one and then print it out. >> I just wrote it this
way on SQL Server just because that's what came to mind and gave me the
right answer for the question that I
wanted, which is Bilbo. Postgres, I spent a lot of time
and I don't even still know, I talked to Shay about this
yesterday and I still don't know if what I came up with is even the right way
of doing things on Earth because maybe Shay can talk
a little about this in a minute. But Postgres has this longest
common ancestor function that looks like it's
perfect for this. Is except that it's a static function and you
have to pass two things in. You need to do a query
while you're doing a select many to bring
both of them back. To make it similar to
the SQL Server one, I had both the hobbits queried
first and I was just passing them. But in this case, I'm actually re-selecting them in a
sense because I couldn't figure out a way how if
I'm just passing these in. Then of course this thing
just brings back the LTree, the hierarchy, the
LTree of the thing. You actually have to
match it again in order to get back the
thing that it matches. Shay, sent me this simple function yesterday and I'm like that looks awesome and then I tried
to actually use it. As always, when somebody gives you an answer and they don't know your
actual domain business logic, it didn't actually work for me. Anyway, so this works. We can run this. >> We can also use the
SQL Server approach. You can also do the same. >> Yeah, you can absolutely do the SQL Server approach
and it will be fine. But anyway, I just bringing
this up now because this longest common
ancestor looks so useful if you're trying
to do these things, but it's perhaps not the easiest
thing to use in every case. >> The source of the complexity here is that
like any other function, in general, in SQL longest common
ancestor needs to get two. LTree is right like two hats. It needs them on the same row. Where on a relational database, so it's going over row by row. Then when you want to
invoke a function then you have to get the two values
from the same row in prints. Now, of course, the thing here, the source of the
complexity is here, we actually want two LTrees from different rows. We
have various rows. This is why when you look at the actual SQL the SelectMany
translates to a Cross Join. This is basically a trick of
ours to take two rows and to flip them around horizontally so that you can take
those two LTrees, which were first vertical, and make them horizontal so that
we can invoke that function. Here, I think, definitely compared to what I
experimented with yesterday, there's more complexity than
what I tried yesterday. But the basic principle
of it is the same. We're using SelectMany,
so this has a self-join. This is where we're joining
the table with itself. Then theoretically, all the
combinations of different habits are joined together so that
they're on the same row so that we can call longest
common ancestor on them. Which is a nice string. It's not just an Altria triggered, it's something that's used in
SQL programming in general. Just so that people are aware. >> Indeed, yes. I was going to try and find what you sent me yesterday, but I won't. But it also sometimes
looks easier to do a SelectMany using the
comprehension syntax rather than doing a
SelectMany way you can do a join and then it looks easier. Anyway, this is the end of the common examples between
SQL Server and Postgres. We can answer some
general questions, and then we can also go and look
at some things that you can do with Postgres conceptually a little bit different from
what SQL Server supports. >> I want to say a few things. I think there's a
discussion to be had. We had this discussion offline, but I'll show as an
introduction to this that this question has come
up at least three times, I think in the chat, I'll
highlight one of them. The question is to compare
hierarchy IDs or LTrees with just the plain old style
child-parent relationships within based identity
and foreign keys. I'll show another one basically
that asks the same question. What about id<-parent_id using
recursive CTEs for all that stuff? First, I want to clarify
what this means, for people who are not
necessarily aware. Instead of all this buzz
around Altria and hierarchy, if you just have your
basic relational database, the way to represent a
model is very simple. Every row can have a parent ID, which is just a foreign key
relationship to the same table. Basically, you have a person
in that or halfling in a row. Then each halfling has
a column that says, who has my ancestor basically? Using that, you can basically model, it's the same thing
you're modeling a tree. You're organizing all the rows in
that table into a tree by simply having a single column which has a foreign key constraint
back to that same table. That's a very common way
to model things around. Indeed it is an alternative way to do exactly what Arthur
has been showing you, which is having a
table of halflings. Now at this point, of course, this begs the question of why are we doing it this way with hierarchy ID. First, I came from the
Postgres world with Altria. If you look at how people
use Altria in general, you look at the samples, you see something a little bit different
from what Arthur showed. In Arthur's example here, the things inside the hierarchy ID or the LTree are rows
in the table itself. It's a hierarchy that basically is applied to
the row of the tables. Inside the LTree for a given
halfling, you have other halflings. This by the way is why you
have numbers in there. Those numbers actually correspond to the primary key of
those other halflings. Integrating, we're basically refer, is that not correct, Arthur?
I see you shaking your head. >> They don't have to
respond to the primary key, but they are an alternate
key in some sense. >> They identify the other
rows in the same way. >> They identify the position of
that row in the hierarchy. Yes. >> Exactly. Notably what
this means is if we want to suddenly do some re-parenting
or changing the hierarchy, that actually would require quite a bit of potentially
complex queries. We'd have to go over all the
hierarchy IDs of everybody in the database in order to modify some number in the middle to some other
number in the middle. Just to make sure that people
understand what it means when my hierarchy ID value is
actually referencing rows. Just the alternative,
what I've been seeing in, none of us who are experts in
hierarchy database things. But the thing I'm used to seeing
is a little bit different. In Postgres like the
first sample that you see is not something that
refers to other rows, but just something that
applies a classification or an organization to
that specific role. For example, let's say
you have your employees. So you have a lot of
employees and you want to say which departments these
employees belonged to. Departments are not employees. This is the crucial point here. A department is not another row in the same table that we
were just discussing. Departments can be hierarchical. The engineering department that's
part of the R&D department, that's part of this department. It could be arbitrarily deep, so it is a hierarchy. But the things inside the
Altria generally don't represent at all other
rows in any way. The rows are, you could think
about them as the leaves, the employees, or whatever. But the thing that you shove inside your actual LTree value is not at all other employees or
other things, and other rows. This is why in the LTree
sample, maybe Arthur, we can show that sample with the, I can also do this. >> Yeah, I can. I'm going
to show the [OVERLAPPING] >> Just to show an
alternative thing. >> Yeah. >> Like from the docs. >> Okay. Let me do this. >> Here's an example. >> Great. This is the example
I'm used to also working with. I wrote the tests using, like for the posttest provider, I wrote all the support for Altria
and I use these tests as well. What you see here as categorization
of subjects or whatever. There's top which is an artificial node that's
the top of the root. Then there's science versus
Hobbes versus collections, and under that there's astronomy
versus amateurs versus pictures. The crucial point, sorry. >> No I just, I look to this and I'm
like what is this? Is this really subjects
like collections, like it's just things, interests maybe, I don't
know. Anyway, keep going. >> Maybe. It's some classification
that makes sense for you, for your rows and your table that basically is what L tree
is supposed to be about. That's also why it's about labels. This really is just a
form of labeling that also happens to support
hierarchies trees. That's all that it is, which I think maybe
is philosophically a little bit different from
the hierarchy idea approach, maybe I don't know this. Again, we just discussed this Arthur and me a little
bit in the past few days, but none of us are experts
in this in anyway. But the point I'm trying
to make is if you're using this organizational
structure in L tree, then obviously it doesn't make
any more sense to compare it to the traditional parent ID thing because the relationship
is no longer between rows. You can't just show a parent ID and point to another
row because the Altria doesn't contain rows anymore. It contains some arbitrary, this thing with strings basically. I guess now we can go back to the original question
that people have said. If we want to compare this
with the traditional thing, then that means already that
we're using hierarchy ID in a way that represents
a tree over rows, and where the rows themselves
participate in the tree. It's the first model rather
than the second model and that's a point that
I wanted to drive home. I talked for a long time. Arthur does this makes sense.
Do you want to add anything? >> I can talk too. >> Absolutely. >> I think that makes sense
for the Postgres view of it. But I mean the, the SQL Server
one is called hierarchy ID. In most of the sample databases, it truly is the primary key. If you think about it, is the
difference between knowing your manager or your
management chain. If you meet another employee in
the hall and all you know is your manager and you want to know who's our common
ancestor effectively, you'll have to go find your manager, he'll have to go find his Manager, they'll have to find their managers. But if you know your chain, you can answer that question with
only two pieces of information. That's why these are
more performant and more optimized because any
two notes can know their relationship to each
other without having to write a CTE or all these joins, just trying to figure
out this information. >> I think that is critical. If you read this SQL Server docs, I have a little bit at least being able to do these kinds
of queries efficiently. I can find all the descendants efficiently and do that
without a recursive CT, that in its name has to recurse. You don't need to do that
because you've got all of that stuff encoded already
in the hierarchy ID, that's a big reason I think. >> Exactly. >> I'd love to know why they
added it to SQL Server, like what the use case was? It may pure speculation on my part, I have no knowledge on this, but it makes me think when in
fashion they we're going to store file systems and stuff in the
database that they were like, we need an efficient way to be
able to query file systems and things in the database because that's one of the big
advantages of it. >> I think both of you obviously
hit the nail on the head. The point is that you have all
the information right there on the role basically and you have
indexes that's basically all. I think that is enough, at least for me as a proof guide, that's enough of an answer for sure. It's a beautiful feature, just looking at that. I'm sure we've been throwing
this term around recursive CTEs. I'm sure people here, our viewers aren't
necessarily aware of it. I don't think we have an
example like often to show, but there's a thing in SQL
called a width clause. It's for a CTE, that's a
common table expression, that's what CTE means. Basically, it's the only way where you can introduce recursiveness
into SQL in general, SQL does not support recursiveness except via this tool which
is called recursive CTEs. When you use the parent ID, the classical way to
model a hierarchy then if you want to do something like get me all the
descendants of somebody, then you can't just write a Select
to do it because you only get, if you ask, get me
that the descedants of X then you'll only get
the direct descedants. If you want to go and get
those descendants as well, and then those descendants as well, then that is a recursive query. This is a thing that you can do with this tool called
recursive CTEs. However, that every time is like, I'm not yet another big query that you're doing
in a recursive way. Efficiency-wise, this is
not ideal recursive CTEs. >> Is like doing lots
of joins together. >> You can look at it that way. Exactly. I mean,
there might be tricks and various ways to
optimize and in a database, but it's not going to be like
looking at a specific row. Basically or just looking at
all the rows who's hierarchy ID begins with something which is a pretty simple thing if you have an index and SQL Server supports. I looked into this a little bit, it supports indexes over this which
allow you exactly this query. Give me all the rows
where the hierarchy ID happens to start with one slash
two slash four or whatever. On the post websites,
you have the same thing, so you have efficient indexes
as well on the Altria. But things don't
necessarily as Brice said. I mean, things don't
correspond to rows, it's like a different thing. It's an arbitrary
classification thing that's detached from your rows. Basically it's just
a bunch of labels. By the way, sorry, I'll just say one last thing
and if I'm not mistaken, and Brice will correct me on this, one of the major differences
here people have been asking, can we haven't abstraction
over these two things? Like one big difference
for example in postgres, the labels are just labels. They can be textual as
you saw in the example. Whereas, if I understand correctly, that's not the case
in the SQL Server. >> Like spatial, there's a standard, there's a spatial standard and
all the databases implement, there is no hierarchy ID standard. >> Exactly. >> They invent their own
solution to these problems. >> Exactly. >> I mean, think
from an abstraction. Ties with what I'm going to say, which is that postgres, supports the more restricted view that SQL Server has
over these things. But postgres, being the
overachiever in the database world, can go and do other stuff that is conceptually somewhat different from what the SQL Server thing does. In terms of abstractions, I think you could
imagine an abstraction over the basic concept
of them as IDs and that could be
implemented in both but it would be limited to that. Then I have no idea what MySQL or any other
databases support in this. The one thing that Diego
brought up years ago was, it would be really cool if we could actually map these things
and navigation properties. Like I've got a half ling and then its ancestor is a
navigation property, and then it has a collection
of its descendants. In my object model, I have something that
looks very nicely like a C# object model but
then that translates to hierarchy IDs or L
trees in the databases, and I think that
would be really cool. I think it's non-trivial
to do in some respects, but I haven't tried to do it either. But I think that would be a very interesting
specific abstraction of this that would fit
to both of them and allow you to model these things
into perhaps a more natural way. >> That I think
actually goes back to this question of the parent
ID that people were asking. In effect, the hierarchy ID rather value that you have
there is like a foreign key, to your parent and also
all of their parents. All the way up the chain. >> It's a key that
contains everything. All of it's like primary. >> Like a variable length list
of foreign keys, you can think. >> Exactly. >> You could think about
like something like that. >> We're going to look at
some cool Postgres examples that think about it in this
other way in a minute. Before we do that though, I just want to go back
to something that Shay said which is about
manipulating these things. One of the reasons
why I didn't make it, and I was going to
try and do this on Postgres but postgres
as far as I can tell, you could write this on Postgres, you could do this because
it is client-side stuff, but I didn't have time to do it. I haven't got the
Postgres example of this, but this is on the dotnet bloggers, which I copied and
pasted from what's new in EF Core 8 so either
place, how's this? But the idea here is I actually
do want to take a subtree. If we go back, if I
can find my thing. What did we say? Let me read
the text on here first. Shay reckoning 1752 LongoGate we revealed that Longo is not
in fact the son of Mungo, but actually the son of Ponto. I know. Longo here, not the son of was that,
I've forgotten now. Longo not the son of Mungo
actually the son of Ponto. We want to move Longo. This tree, all of Longo's things
and put them onto Ponto here. This is a place where ideally we don't want the
primary keys of Longo, Ortho and Lotho to change because they're actually
the same entities, but their place in this
tree structure has changed. We use the "Word" re-parenting in databases or object models a lot, but it's literally, it's like I'm re-parenting Longo
from Mungo onto Ponto. SQL server has this
fairly useful function for doing that called
get reparented value, where you basically say, given this descendant, go from Mungo to Ponto instead. You can use, you can load the
descendants into EF Core run list, this is a client side thing. I mean, it probably translates
to server you could do with their execute update
or whatever as well. But in this case I'm doing on the
client side and then save it, and then it's going
to then send back. You can see actually here it's
interesting to see the parameters being sent in the
compact binary format. >> Nice. >> Then after that, now if we do Mungo, we get these things. Well, Ponto now has
Longo in the Ponto. There is actually a case where you can separate this thing out from the primary key and
say actually this is not necessarily the
primary key of the thing. It represents the
key within the tree, but that could be conceptually
different and we might want to manipulate that to move
parts of the tree around. File system for example
copy this folder to here. Now maybe you would
want to do that by deleting it here and recreating
it here, I don't know. But either way, it's conceptually
you're moving things around in the tree without changing their
primary identity as it were. Anyway so that's just an
interesting thing there. Because Postgres has
this representation of it and we're not thinking of it in strictly in terms of keys
in a tree in a sense key, like on the SQL server model, we can do some interesting
things with it. I'm not going to show very
many of them primarily because I run out of time
preparing the demo. But I am going to show
some pattern-matching and free text search that you can do within these things, which is cool. I have in here, I've called these things interests because like I was saying to Shay, I don't know what these
things really are. They don't map very well to an entity type in
this conceptual model like we were talking about. I put L tree as the primary
key in the interest. All we really care about
is the labels in the tree. So I put comments on here like I still some comments
about each of these things. But again, it's like, does this really map to an
entity type in the same way that it does for the other
case? I'm not sure. Then for the seeder, I have this case where
we're doing the same thing, but we're just using these labels
from the Postgres example. Let me go to the
Postgres example here. I'm going to change this
to interests, contexts. Then comment down this stuff. I've got a pattern
matching example here. But what we can do here
is we can actually say that this matches lQuery method, which is going to translate into
one of those funky operators. It's going to my
*.astronomy.* so basically return everything in
here that has astronomy. I'm going to run this, but
let's go back and look here. This is interesting. You'll notice because
astronomy actually appears at two different
levels in the tree. Rather than just being a number, we're actually saying the
semantic meaning to the fact that the Word astronomy
appears here and appears here they are related in a
way that isn't true if this was just one two three
and this was one two three four. Didn't you wouldn't
just because this is one on this these one doesn't
mean they're related. But the fact that this is
astronomy and this is astronomy, and we're saying there's some
semantic meaning to that. We can now run this query which
uses what's the operator it uses? The parameters is this one. It's just the tilde
operator. There you go. There we go. We can again by
everything with astronomy, no matter what level it's at. It's basically doing pattern
matching and because we could do so and see patterns, I don't even know what
this pattern means. Maybe one of Brice or Shay knows what
*.bangpictures@astronomy means. >> I don't know. It's a whole
query language in there, which I don't know. >> It's matching. Let's assume that that's the correct
match for that pattern. For which Shay or Brice guesses
about appropriate more likely. Very similar to that, we can do a full text
search on these things. In this case, here for doing the patent matching
we're doing MatchesLQuery. In this case, we're
doing MatchesLTxtsQuery. We can run this again, and it will use a slightly
different operator. So it's using the app operator here. I don't know anyway because I did this like five minutes
before the show. It's presumably doing
a free text search on the ltrees text. Basically, there's a bunch of
other things here so two pages. We stopped doing show links
sample. We need to do this yet. If we had show links, these would be going in the shown
link, to documentation pages. There's this documentation
page for the postgres support, which talks about all of
these different things. Goes through conceptually, the difference between the
text query and the L query for the free text and whatnot
and then talks about all of the different operators
here that you have. Then the function so we mentioned
subtree and subpath and n level. You can also do an index and
then some other stuff here. Then it talks about creating
indexes on here, which is useful. Then it has some of these
examples where I just copied and pasted and put them into EF. Then on the Npgsql EF
core provide a page, there's the functions of ltrees
and what they translate. So you can just say I've
read that and I know I need the question mark at
greater than operator. Then I'm going to pass a list of ltrees and do FirstOrDefault
and do IsAncestorOf. That's what it will translate to. Postgres has a lot of stuff there. But in general, I think, We've covered why
you probably would, might want to use these things, how you could do it in
SQL Server and Postgres and at least hopefully
given people an idea. Any questions that we should
cover before we end Shay? >> There's one that I want
to speak too quickly. What is the benefit of using
this approach versus JSONB? So basically starting
a JSON document. First of all, what we've
been talking about here, both LTP and hierarchy IDs
are a path within a tree. They are a specific path, whereas a JSON document
represents an entire tree, that means that every row
has a whole tree there. JSON is just a different thing. If you're trying to
model a hierarchy where each row is part
of that hierarchy, I don't think JSONB
makes a lot of sense. Of course, if you
don't, for example, if I were in some other
database like MySQL, which doesn't have one of these. Then one way to do this would
be to store an array of all the IDs of my parents or all my categories
or department names. Since database is generally
don't support arrays, you could use JSON to hold an
array on each one of you rows, but that's not like a
full-fledged JSON document. It's just a way of holding an array. It's important to understand that
conceptually these are arrays of IDs or department names or
whatever it is that you want. They're not a tree. They make up a tree. If you take all of you rows together and you put
all of them together, then you have this big tree. But each row on its own
does not contain a tree. It just contains a
path within that tree. That was my answer around it. The last question for you, Arthur, is will, this sample
code be available? >> Yes, I will put it up on my
GitHub account, ajcvickers. I'll push it up. If not tonight, then tomorrow. I will definitely do that. Also you can also look
in the What's New in EF Core, AKAMS/EF8H-new. That has a bunch of the SQL Server stuff there with runnable samples that
you can get right now, where we call hobbits offerings, but it's the same stuff. But, I'll put the Postgres
stuff up as well. >> It's nice. The samples are great because they're EF code
using these two APIs. But in general, you can
just look at the docs, understand what it is you
want to do and then just, call the methods in our docs which
translate to those operations. It shouldn't be too
hard to piece together. But of course, samples
are always even better. >> I see somebody asking
about MySQL there. I don't know if it's
been asked before. I have no idea whether there
is support for this in MySQL or not, likewise for SQLite. Brice, I don't know, SQL
has anything like this, but both the MySQL provider and the SQLite provider
are open-source. Even if you don't add
it to the variety, you could probably
create a package in the same way the
community did for that. Feel free to go in and
look at it or whatever. It's not something that we on
the team are planning to do. But we'd be happy to help out
with the community advice, make sure it works the same
way and all that stuff. >> Cool. >> I think that's it? >> We're not like super overtime. Only like 17 minutes after the hour. >> Only 17 minutes. Well, it's the record
first. Thanks, everybody. Appreciate you watching. We'll figure out something
to do in two weeks time, probably the day before.
Bye. Thanks all. >> Bye. >> Bye. >> [MUSIC]