Entity Framework Community Standup - PostgreSQL and EF Core

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so [Music] so [Music] [Music] [Applause] [Music] hello welcome to another community stand up um i'm here with shai and bryce this week um we don't have any jeremy this week i believe it's his birthday so if you're watching which i really hope you're not and happy birthday um and uh but we have bryce who you know we talk about a lot on the show he's on the eft and does a lot of uh work on migrations and tooling and all that stuff uh migration bundles for ef core six zero so that's going to be cool and uh and shy also on the ef team but before he was even on the af team the author of npg sequel maintainer of mpt sequel the idiot provider for uh postgres and the ef provider for the af core provider for the same thing so he's going to talk to us all about ef core and postgres um so we'll get into that in a minute uh before that let's just check on the state of the unicorn it's all about one thing at the moment and that is that rc1 is on you get now um i would prove it to you and maybe that will encourage you to go download it see there it is uh rc1 7000 downloads since last week download it try it so basically it's feature complete so if you want to see what's in uh it's a of course 60 then look on uh new get now there's one feature coming into rc2 which is a column order attribute uh but other than that everything is in there let's quickly go over some of the things that the highlights we've talked about a lot of this already on previous episodes temporal tables we're going to do a blog post on that uh migration bundles bryce's big feature here uh better cosmos support and we've talked about that as well and there's a blog post on it you can now scaffold from an existing database directly to many to many tables uh we have better support for group by translating more patterns you can set default value conversions so for any property of a given type you can say i want it to convert to a string or whatever else um we have compiled models we talked about that on the show before so better startup time if you've got big models we have generally better support for normal referent types the whole code base now uses normal reference types so when you're referencing ef from your own code you get all of the correct annotations and everything there plus we do scaffolding now of code which uses normal referent types so better support there there's better support for split crews especially collections that are not navigation so when you're projecting a collection you can use those better with split queries now um and perf you know we've talked about this before but uh big improvements in tech and power fortunes from the work shy's done primarily um we're now pretty close like within four percent of dapper perth on that so uh awesome stuff there um there's a dock that's still a work in progress um i'm working on it this week to finish it off but uh this is in our documentation um if you go into releases and plan planning and go into ef course six there's a watch news section and here we have lots of examples pretty detailed so that's temporal tables there then migration bundle showing you how to use all of that stuff the pre-convention model configuration for setting value converters for example compiled models how you do that this benchmark sir so improvements that we've done in in perth for tech in power and compiled models cosmos provider enhancements this is written but not published yet and then there's a bunch of stuff like there new translations so this is a pretty big document with a lot of stuff in it and uh as always with our documents um the the we have samples for them on github so if you look on github in the entity framework docs repro and download this we have in miscellaneous new in af course 6 and then there are all these samples and all the code in that document comes from the samples that you can run and try and debug and all that kind of stuff um since we're here like we also have all of samples for all the other old stuff as well this is a great place to go and uh and try stuff out and figure out how to use stuff so that's ef core 60 the important thing is we need your feedback so the reason we do these rc releases and you know we feature complete now is so that you have a chance to try it before we go rtm after we go rtm it's hard to change things it's harder to fix bugs we have to go through a patch process you have to wait so try it now file your feedback now and we've still got time to get stuff into the rtm release for any bugs that you find or if something doesn't work right or something's really bad usability issue then let us know and we've still got time to fix it so if you're one of those people that's listening to this then you're one of those people we want to go and and get it off new gear and give it a try um it's got go live license uh so you as does dot net core six so you can even put it into production uh first party teams in in microsoft i don't know if i'm supposed to say which ones but they're always already using.net 6 in production so give it a try so yeah try it okay yeah so let me now pass it over to shai who's going to talk about progress cool so hi everybody this is um actually a talk i've wanted to do for a very long time it is a postgres talk but it's also so it's about postgraduate of course but it's really also geared towards generally uh you know database people so if you're if you're using sql server people which is what most most ef people are doing this talk is definitely meant to be relevant to you and i'll call out certain things so the idea is also to use postgres to um you know dig under the hood a little bit and show you how things work and how things can work a bit differently this is not just for people using postgres at all uh having said that i personally believe postgres is the best database in the world so i'm gonna show you stuff i'm enthusiastic about um so what is postgres just like in two seconds uh postgres is a relational database uh exactly like sql server or like oracle or like my sequel um a little bit like sqlite sqlite is a little bit of a different piece there um uh it originated this is it it comes from that you know like it's an old-school unix kind of project this is something that originated mostly in the unix world originally it looks like that it's very very so it's written in c by the way it's extremely lightweight i remember i worked i actually had to work with oracle about 20 years ago a bit more now i'm really old actually and i remember the footprint and the management like installing oracle instead getting to a state where you can actually connect to a database with something with something extremely heavy sql server vastly improved on this but sometimes when i when i install pro squares next to sql server i feel a little bit like this so a postgres like instance running on my laptop takes like uh tens of megabytes in principle uh it's very lightweight you can start and stop it like extremely uh you know extremely quickly and that doesn't mean that it's a less serious database so it has this this postgres is with mysql these are the two major open source databases used in production systems around the world it's considered both of them are considered very good obviously postgres is better but anyway doesn't matter we won't go into that uh every major cloud provider nowadays basically sells you a managed version of postgres so microsoft has it on azure you can go and get like a managed uh postgres you also have hyperscale versions of postgres so similar to um uh you know various other hyper scale things you can go um microsoft bought a company called cytus data yeah i'm not going to spend too much time on this i want to get to the code as quickly as possible but the idea here is that this isn't this is a very lightweight kind of you know open source and free by the way database as well so there's no licensing and so on but on the other hand there's the whole enterprise part of it and officially recognized part of it what you have it is in use in major enterprises you can buy support contracts you can buy them from instances from you know cloud providers it's a real thing this is not like a small thing that is only used in whatever small shops or something like this um in general um since it's a relational database it looks a little bit like you know sql server at the superficial level it's going to look and feel exactly like any other sql database you use sql to interact with it um so at that you know it's standardized it's actually one of the more standards conforming supposed to be kind of database so there's something called the sql standard um unfortunately databases don't necessarily follow it that to the letter that's like a big problem there's a fragmentation there postgres is supposed to be a bit more standards conforming i guess it doesn't really matter to end consumers whether it's that standards conforming or not since other databases are not necessarily standards performing so as an orm author i absolutely love how compliant yeah definitely that's that's cool to hear yeah absolutely and then you know in ef core it's going to feel you're going to see this in two seconds it's going to feel pretty much like like a sequel like sql server or if you're using sqlite or mysql one important point which we never get tired of repeating is that ef core doesn't attempt to be like an abstraction that hides away your database that's always an important point so certain details are going to leak through postgres is not you can't just swap a database provider and everything works that doesn't work because it cannot work this way databases really are different some notable examples is that postgres is um you know case sensitive uh when you do string comparisons by default uh predominantly where a sql server is not so if you're just taking your code across like that that's not gonna work the list of types that you have in postgres and in sql server there's a subset that's very standard like you know you have strings you have ins you have time stamps although that's even that's complicated that's pretty much going to function the same way but the moment you step out of that things are going to be very different so i remember coming from postgres you come to sql server and there's this whole mess around uh you know you have varchar but you also have envarkar if you want unicode so that's already two types that's a bit weird and you have uh you know varchar with you know 50 in parentheses or you have max if you don't want to specify the size and that's very important you can't define us like an index or primary key on the far car max it's all kind of complicated in process you just have something called text and everything is managed behind the scenes and basically there's no perf uh there's not supposed to be any sort of perf impact there's no advantage in saying oh i'm going to do a varchar50 you can but it's a validation thing it's not like a perf thing or it doesn't there's no limitation of indexes anything like this so types work differently they're not meant to be they will never be completely you know abstract across databases when you make a multi like um when you attempt to do a database agnostic you know application you have to keep all that in mind uh do i have anything else i want to say one one last thing i guess which is important is the perf aspect of it uh yeah two two more small things so the perf aspect of it is quite important um uh so postgres the postgres is fast but even more than that a lot of effort has gone into the mpg sql driver and ef core provider mainly the driver it's just a personal personal passion of mine i've always been interested in perf and i happen to be the coasters guy so it so happens that mpg sql is the more performant um database driver in general coupled with the database at least i'm not gonna i think the mysql by the way providers also very fast but then the database itself also has um you know perf difference basically what i'm trying to say is if you look at the tekken power you know benchmarks and you see asp.net up there in the top 10 which is where where we are done it is in a very very good place that's running with postgres that's not running with any other database it's important to mention it that doesn't mean by the way if cor that's by the way running bareado.net there's no ef core there efcor is also quite good in its category of full-fledged orms and so on and as arthur mentioned we've done some major improvements to um non-track query performance so that's gonna probably shoot us up a lot in the ef core um side of things but uh once again the perf aspect of uh totalscoresupport.net is very important to me and it functions quite well and the last thing and that's related to my last point there's a distinct advantage in maintaining both the ado.net driver the low-level database driver as well as the ef layer that is kind of layered on top of it when you control like the full stack you can do all kinds of things which are more difficult so when i'm when i get excited about a feature i can implement it all the way down efcor ado.net whatever whereas when we want something in sql server then you know we work on top of sql client which is a different team which has different priorities that's understandable that's just how the way the world works of course things work a little bit differently and they may be a little bit slower in terms of introducing the newest kind of features i think any any comment on that um no i was just gonna if you i was gonna go back to the perf thing and just say that uh we're looking at dot net seven and af core seven planning and uh we have even more ideas for perth there some of the other tech and power benchmarks we're going to look at those data updates for example um so yeah more more perth stuff's coming there's always more perth to do and we're always going to get faster so i was working on the sql light perf and i was chasing uh postgres i thought for sure like oh i'm just a tiny in process database i can catch it right yeah no i couldn't like it turns out like offloading all the cpu to another computer and going over the network you know i've always been told networks are slow yeah no apparently like like i couldn't even catch it with single light which is crazy to me but on that point uh another thing that i didn't mention because it's not really an ef core six but uh microsoft data sqlite which is one of bryce's projects that he pretty much does everything on that has connection pooling in 6.0 so especially if you use an encrypted database but i have to say i thought it would only really be useful encrypted database now even even when you're just using a normal database connection which is literally opening a file handle right i mean it's nothing but it's still faster with connection pooling so that's an awesome thing on there on sqlite yeah sebastian from the orchard team had some tests and they he saw an amazing throughput increase um i don't know why you'd be running orchard on sqlite but apparently very cool yeah idio.net has this thing where you in general in that and you you tend to open and close connections a whole lot uh like a huge amount so you really lean on you know open and close being very fast via pooling in general and i guess even you know opening a file that's a system call and you're doing it a lot so anyway so before we go into it let's just uh the one question here that uh that comes up quite a lot is it possible to write database agnostic migrations in ef for example one that works on postgres sql and on sql server bryce do you want to answer this sure um i'll start by saying like the up and down methods within a migration they're kind of like a dsl um if you want to hand author some you know it's just like writing sql that would work on both um you get a little bit through our dsl of like migrator.createtable and stuff but it's going to be really hard it's going to be very manual our recommendation is just to keep two different sets of migrations one for sql server i want for postgres it's just a lot easier that way i'll just say that but in theory it is possible um it's going to be a lot of work and you're going to have to test it a lot so this is uh this identity sees that anton pointed out here is is different and that's one of the things that tends to be different across all the providers and so that's one of the things that is often set up differently in migrations so to bryce's point yeah having two separate sets of migrations is generally the best practice you can write conditional code you know you can generate two migrations and merge them together and then have if you if it's running on postgres do this if it's running on sql server do this or mysql do that um it's probably not worth it most of the time but uh it is possible to do that yeah cross databases is a particularly i mean it's hard it's basically hard to keep keep you know the same application working on multiple databases it's not a trivial thing it's like another layer of work that you have to do in general but that's unfortunately the reality of things okay so i guess i'll i'll start diving in um can we bring up my screen now that this would be the the right moment for this awesome so um this is this is going to be a full full-on coding thing i don't have any presentation or anything everything is just i'm actually going to switch and just look at code now i i have no idea what anybody else is now doing with their faces or anything um you'll have to flag me on with audio if you want my attention in some way so yeah so i have a very basic kind of template which i'm going to use to develop stuff on so what you see here is we have like you know super basic blog entity type class it has an id it has a name which is required to simplify some things we've got our run-of-the-mill context we've got you know our db set of blocks and here we've got on configuring which is where we configure our you know the provider and so on and we've got our two um database providers here so by the what what i have uncommented here is sql server which is running on my laptop and we also have a commented outline for use npg sql and we're gonna you know switch back and forth a little bit to show the differences between them i've got simple logging enabled so we can you know inspect the sql and the thing basically just blows out the database every single time and creates it again so we don't have any sort of state leakage so the first thing i want to show so i mentioned this thing with types right databases have different types so one funny thing with postgres they like types a lot they like extensibility a lot so there's a lot of support for very weird things that people are not used to seeing in a database so what i'm gonna do is now mind you i'm on sql server at the moment right so i'm gonna just go and add an ip address um field here from if i'm slow you're gonna have to be a bit patient with me and i will just run this and see what happens so this once again is sql server if we look at our create table then what we see is that ip address is a field of nvr car 45 right so this is what you're seeing here is obviously not any sort of ip address field or data type in sql server right what's happening here people might know about this is value converters so sql server or in general ef core comes with a built-in value converter which if your database doesn't support ip addresses natively what it's going to do is it's going to say okay i know how to convert an ip address into a string so i'm going to i'm just going to set that up for you uh we can even you know what just for fun we'll even see like a small uh let's add like a blog here so a new uh blogs and new i'm going to use the target type the new fancy stuff here i'll put a save changes here you're using a top level application there as well oh yeah oh yeah that's no program main or everything it's writing at the top level so very true the perfect thing for for demos exactly so i'm going to parse i'm going to do like a thing like this just parse some ip address so that ip address class that's something that ships with net in the bcl right right right yeah that's that's that's a good point so this is something that comes with net it's uh there's there's another one for physical addresses for example which we also support this has nothing to do with the of course just some class that comes you know for networking and once again if i run this thing and we look at what it does yep then it throws it throws why does it throw already my demo has failed i've reached my first thing have i oh yeah okay so we need because the name is not nullable we have to of course add a name so that's my fault let's add a name you put required on it to make it easier but it actually exactly as always that's actually going to help me out later so that's that's kind of cool but of course no matter how many times you practice you still get it so if you look at what we're inserting here this is important i want to drive this point home when you have the simple logging kind of thing you see our parameters we have the parameter p0 and here we have our ip address uh this once again is a string what you see here is that this is actually a string okay because value converters have made it so so what i'm going to do now is i'm going to basically comment out sql server and i'm gonna oops and i'm just gonna do npg sql and we'll see the exact same thing running on a different database so this is gonna look a little bit different but not that different so we have a lot of double quotes where before we add these square brackets to which sql server likes so much so things look a little bit different but you guys shouldn't get panicked it's all sql it's all like a database however standard sql double quotes in fact it is sql standard that's actually true but yes i didn't want to i didn't want to be the one to go thank you bryce so what you see here is something very different so name is something called a text once again no n varchar or no 45 no nothing it's just a text field it's just a string but here we don't have a text this is because npg sql and postgres natively supports an ip address type uh so you can actually write that in why is that interesting why would somebody want something like this so it turns out first of all this is basically validation so in a string with with the value converter approach with sql server you can put anything you want right nobody is actually enforcing that the string is an ip address here if you try to put something that's not an ip address you're going to get a validation error so it's better like data data constraint consistency validation and so on but posters also comes with quite a bit of functions so if you want to like uh you know ask if a certain ip is in is in a range like in a subclass and a class c subnet or whatever you can actually do this in the database and that's going to be efficient which is kind of far out when you think about it so if you're using postgres to manage like a fleet of machines with ip addresses that's actually going to be maybe a bit useful for you the main point i wanted to show with the insert here is this looks like it's the same but the db type here is an object so this is not a string this is actual native support for ip addresses in the provider i'm going to step back for two seconds this is not that impressive because how many people actually need ip addresses in their databases not that much i know the point here first of all is that i tried to do this like back in the day maybe four years ago i think with ef6 with the classic not the new ef corsics but the old ef6 and this was not possible the old type system was closed it basically had like we support these types you know string and so on we don't know about ip addresses presumably because at the times microsoft was a bit more sql server centric and the list of supported types tended to correspond to what sql server had so if you're a postgres guy or a mysql and you have another type you want to support you could not do this whereas ef core is uh designed from the ground up with this concept of we're open to other databases and to extensibility and i was very pleasantly surprised with ef core that i could just you know write write a bit of code do a little bit of glue and this actually works uh as is the other one to to to just go into the internals for those interested so what happens in the type mapper is uh so the type mapper is used when we're building the model so when we see a type of ip address in the model it gets passed to the type mapper and the first thing the typewrapper does is say to the provider hey do you handle this type and if the provider says yeah i handle that type which npg sql or postgres does then we're done right we use the whatever mapping the provider gives us because it's done if the provider just says like sql server or sqlite it says nope don't know what that is we then go look for a built-in value converter and so if we find a built-in value converter we say oh okay we'll use that instead so it's kind of like transparent you just if the provider supports your type you get it natively and if it doesn't you get the value converter and you really don't usually need to even know which one of those is happening true and this is the same thing i mean some people know about value conversions because they've used them explicitly when you can we're actually going to do that in a few seconds it's the same idea so there's a set of built-in value converters which are throughout all of them and then if you don't have one that fits if you want to do something special uh uh i'll give you uh just a second if you want something special then you can set one up yourself right in addition to the built-in ones now are there i didn't write all of them in particular the ipaddress one that's true the physical address ones were submitted by a community member there was a community contribution to do those so that was that's very important we also by the way the default is to strings but you can also convert them to uh just like a byte array as well if you want to do that and they support ip6 as well as ip4 does anybody use ip6 i guess they do i don't know but they're there cool so yeah so let's move on so uh the other cool thing about postgres is that postgres unlike any other database i know but i might not know all databases is very extensibility oriented there is a first class concept of an extension in in postgres which you can install into your database these can provide new types so there's an extension that gives you something called l3 which is a little bit similar to what people in sql server know as hierarchy id so it's a hierarchical kind of thing which you can create um in fact all of the spatial support in postgres is part is delivered in a very well-known extension called posgus which is like a huge very important kind of extension it's supposed to be extremely good compared to other in general in terms of you know spatial support so postgres is very very extensibility oriented and it's important that ef core allows us also to take advantage of it but i'm gonna i want to go go ahead and get to the more interesting stuff so again ip addresses nobody really cares that much about so i'll just gonna blow this one away right here what people do want to do sometimes however if we're on a blog is to have let's say um a set of tags that we tag our blog with right so somebody coming into this world with you know not from a sequel kind of uh you know background or whatever would expect to be able to do something like this let's define an array right just like i have a name on my blog i want to have like an array of tags on my blog right why not i'm going to comment i'm going to switch back to sql server and i'm going to run this thing and obviously this is not going to work right because sql server we should make this work in the same way that we did the ip address and there's an issue on the backlog to do that where we can just take uh uh and do some mapping to sql server that works but you're right right now it doesn't because so the question is the question is if there's a value so okay let's let's let's that's actually a very important point which what arthur just said so i'll get to that in a second so what ef core here is telling us is exactly what uh arthur just described so he of course interrogated the uh provider which is sql server in this case and ask do you support you know an array of string and of course no it does not support an array of string in this instance there's no built-in value converter so what we get is an exception and this cannot be mapped okay so now what are our options here so first of all um i'm going to comment this for for a bit if you come from a relational background you want to do things in the you know in the theoretical relational kind of way you don't put an array property we're actually going to do this the right way let's do this the heavy heavy way right we create another thing called a tag we give it an id right here uh let's give it a property called the actual tag name just so we don't confuse things right and now we set up uh so if eve five introduced an amazing many-to-many kind of implicit like a very nice many-to-many support all we have to do is say that a blog has a list of tags we'll call it pads right naturally and we'll do the same thing here i'm going through this because we're gonna see the other patterns that we can do and here we have a bunch of blocks so what we've done now is basically a many to many setup right there's a block with many tags a tag with many blocks and since efcor five when efc is this kind of thing it says okay there's many to many we're in a relational database so i'm going to set up a join entity which is the third table in the middle blogs tags which creates you know the many-to-many relationship let's take a look at this let's actually um run this for a second so if you look at the actual sql happening here we have just created three tables there's a table of blogs there's a table of tags and there's the join entity which is implicit in our model it's not there but you've created it by convention so we've we've got our three tables even more importantly i'm going to do a query here we don't really care about this seating right now what i'm gonna do is um yeah let's just do um let's get out all the blogs and what i want is also to have for each blog include the tags right we wanna like suck it all out of the database like this with async the query that we get out of this is a classical many-to-many query okay which is this thing which you have right here the crucial point here is that we've got two joints here one left join from you know blocks to the join entity and another inner join from the joint entity to the tax three tables two joints um now this this is the way that you're supposed to do it if you're reading a sql textbook right this is like the way to model a many-to-many relationship rather than having some sort of array the problem with this is very typically perf and general heaviness of your model it's a complex model it's also very heavy one of the major sources of perfishers is joints uh this this everybody should be aware of this so in general when you're doing uh you're optimizing or modeling for perf you have to think about reducing your joints in general that's done many times through what's called denormalization you want to basically cut down on your joins in in various ways because finding that and those entries which correlate across different tables is a heavy task you're asking the database to do a lot of heavy lifting okay uh so we don't want to do this kind of heavy thing but we're still on sql server for now right we haven't yet made the switch so what's the alternative lighter kind of way to do this thing i'm going to kill this you know many-to-many model which we have here i'm going to bring back you know our array and what we're going to do here is we're going to set up a value converter to save on some typing speed i'm going to just drop this in from the outside this is this is what it looks like more or less so what you have we're configuring the property tags on blog okay so this is the array property and we're configuring an explicit value converter so this is exactly what happened before with the implicit one with what we saw with the ip address what we're saying here now is when we're converting to the database when we're sending something to the to the um to the database we take that array and we join it this is just the method and.net join it with a comma so we stick a comma in between every one of those entries in the array and we send that as a string when the when the values come back we do the opposite thing we basically split it by a comma and we uh build an array out of that so i hope that's pretty much clear this is actually going to work if we look at it supposed to work it's once again not actually going to work no no no yes we of course left the include which doesn't make any more sense because there's no more entity with with that kind of thing so what you see here now is something that is extremely light so if our goal was basically to keep store a bunch of tags on our blog on each blog row then we've done this in a very very light way yes we are now parsing and joining we're doing a lot of strength kind of stuff when we're reading and writing but this is in a way much lighter and a much better solution depending on what we want to do the problem here is what happens when we want to start asking questions like give me all the blocks that have a certain tag right like uh we want to start uh querying and selecting um blocks and according to a certain criterion at that point we're kind of stuck we could use some uh you know some very ugly hacky kind of string things so we can ask give me back all the blog rows where the tags string contains you know this thing but we have to be careful of the commas right this is this is exactly not the way that you're supposed to kind of program things in general you want to do things in a nice way so we're finally going to do the switch to postgres and postgres this exact scenario is handled in a much better way it turns out so i'm going to just comment this thing out and i'm going to switch databases and i'm not going to change anything else so once again we just have a property with an array um with an array type on it if i look at this now we've created a database and what we have here is a first class array type in postgres so in post every type that you have text and you know whatever time stamp or whatever has its corresponding array type uh it's a native it's a natively supported thing so it really is every it by definition uh you know is available for every single type and you can simply map um uh an array directly to this so this is supported at the ado.net level so you can basically just take a c you know a c-sharp.net array and send it to a postgres array and in efcor this is also mapped so that the mapping goes all the way up that's kind of like this this stack thing that i was talking about before this is as simple as you could think you know you you we don't have to do any sort of joins but at the on the other side we don't have to do any sort of string operations joins or splitting or anything of that sort what's better what's even better about this is because this is a first class postgres type you can actually do operations on it so let's do um what should i do now we can also see this by the way as as a sample because i want to keep showing what this looks like if i do this kind of thing here like a foo and bar our two label names you're going to once again see that we're sending that's in the insert yeah we're basically sending a single parameter here sorry that's the name this is the one with the type array of string and the db type is object so we're inserting directly we're sending directly like a dot and array into postgres which is a pretty cool thing we're supposed to uh have support for printing that better in the of course six are you using six here i assume you are we are on five here i don't know oh i'm five here we are should i switch from the world but one of the small improvements is that it's not the gold license if your parameter is a an array we we give you like a like a little uh at least the first few elements of it we don't fill your whole screen up with the entire array if it's huge but yeah that's true that's actually true we got a request for for you know better support of this kind of thing yeah uh yeah i'm just uh here you know i'm using five for safety like for ultimate safety kind not that i don't trust tf6 anyway now let's let's go and do some more interesting things with this let's say we want to do some sort of query on this so what we want to do now is get out basically the thing i wanted to say before we want to get out all the blogs where uh you know there's the footage basically okay this is already something that breaks down when you're using the value converter thing so the the limitation of value converse at least right now we have some ideas on this but right now when you use something with a value converter you can't call any methods on it in a query right we don't know how to translate this all value converters give you at present is the ability to serialize something back and forth into the database but in the query pipeline and the query expression you cannot invoke any methods and expect them to get translated however since here we're not using value conversions we're using native support and that you know the the the the provider has been written with this we can actually try this so let's see what happens now okay what we see here is some weird thing okay this is already out of the domain of what most people know right what we see here this is the b tag so this is our array here there's this um weird operator squiggly thing which is the postgres array containment operator postgres loves uh by the way operators there's an explosion of weird operators there's operators with three and even four characters i think it's really very very uh very advanced and here what you what you're seeing this is a little bit weird foo is what i actually wanted to ask this is the containment right but what we're doing is constructing an array around a foo so it turns out for some bizarre reason i couldn't find out uh the containment operator is basically a set operation so what you you put an array to the left and array to the right and it's going to tell you if this array is contained in this array but there's no it doesn't have an overload that allows me to just put a scalar here so what the provider does when translating this is it builds an array out around it which is fine i mean there's no problem with doing this and then we ask whether this array the tags on every row contains this array which is the literal uh thing okay of course since you know i'm i'm me and what i care about is birth then what we're going to do now is we're going to switch into a perfing and we're going to look into how this thing fairs okay nope i copy pasted it incorrectly one second that's interesting because we had a question here about some complicated thing about the end text heap and all of that stuff if you have ins and everything so uh yeah that's related to your perfecting maybe maybe i need to understand the question a little bit better i guess okay i thought you might understand it's all it's all uh postgres to me what's with the stuff okay so first of all i'm just gonna execute this query right uh just so that people see when i have the response here we have just one row right that we seated in our application you can see this column and here you have this what looks to be like this composite thing it's a string but in fact it's actually an array so this ring is just a representation i hope everybody understands that this is properly like an array in on disk you know in postgres it's just being represented here as something like this now when you want to know how something fairs you want to look at a query plan every time i talk about this whenever i have an opportunity you have to always take a look at the query plan basically to know if your query is going to run fast and if it's going to use indexes and and whatnot right so in postgres the way to do this this is different from database to database is you basically prepend explain like this okay so i'm going to run this thing and instead of getting a response what i get is a query plan what we see here is that postgres is now doing a sequential scan sequential scan for people who are not aware means that we're going over the entire table in this case the table is pretty small right it doesn't have a lot of rows um and it's applying our filter so this is how it's kind of planned to do this now if we want to start looking at perf the first thing we want to do is probably to add an index to this thing right that's like the first thing you do when you want uh proof so i'm going to add an index now on my um on my thing arthur feel free to stop me at any any point yeah i will after after you uh get through this maybe we can talk about json columns a little bit absolutely i have that on my plan as well i just hope i'll have enough time to get around on that so yeah okay i mean i have a section uh meant for this i just hope we will have enough time but i'll try to not yeah i'll try to get ahead so let's i'm going to recreate my database now so what so we can see my my index actually getting created here right so my index is here and now i'm going to switch over to my let's get this down here and we're going to run this i'm going to see this running much faster right we're not going to see a sequential scan anymore so we are seeing a sequential scan in fact can anybody here tell me why we are seeing a sequential scan can anybody help i can't this is database stuff i don't do databases so what's the first rule of checking performance in in databases is you have to have proper data in your databases if you have a table with one row that's just not going to fly always why even if we have an index with one row that postgres makes a very very easy decision postcards basically knows i have an index but what do i care i have a table with one row i'm just going to check it basically and there's a threshold there even if you put 100 rows it's still going to be more efficient to actually go and do a sequential scan over these rows there's some sort of cut off very mysterious nobody knows nobody should actually know this there's some sort of cutoff at some point there's enough rows and postgres the planner the query planner decides okay now i'm going to start actually using this um this thing so the main lesson here for everybody whether using sql server postgres sqlite or whatever if you want to get actual query plans if you're doing perfwork on your queries please seed your database with actual real-world data if your table has a lot of rows you need to see a lot of rules so what i'm gonna do here i should i should have known that because we had we did a show on this right on curry plans and you said that in that show so i should i should have remembered in fact every time i do a show here i basically repeat this thing like the next time i'm gonna do a show we're gonna have this moment again and and i also remember i don't know who brought it up maybe it was you or somebody else on that show but don't make sure you've got a similar kind of uh variety of values don't put null in all of them don't put the same string in all of them because sometimes databases are very smart about that kind of thing it will still do different things if you if the data doesn't look right that's true yeah so the best way is to have an actual copy of your production database but be careful of you know gdpr and privacy stuff but basically you want it you want something resembling your production data data set as much as possible that's that's the ideal world so i have a fragment here which i copy pasted which is going to insert 500 000 rows which is well enough to uh you know satisfy postgres instead of in terms of using the index that takes like took six seconds before yep about six seconds okay so now we have so many rows that obviously this is gonna not work right so i'm gonna run this and obviously it is still not going to work so what you see here is something a little bit different this is this is how presentations work there is something a little bit different postgres now uses something called two planners so it's actually doing something called the parallel sql scan it has enough data that it's decided to spawn out to run the query in parallel this is like some improvement that they did in in some recent version of postgres i think in 13 which is what i'm running on we don't really care because the hated words sequel sequential scan are still there this is what i wanna you know eject from this this thing i don't want to see sequential scan over 500 000 rows right so what gives what's the problem here obviously i've done my homework and i've run this before and you know i'm not i'm not getting caught here with my pants down and so on the thing is is that what we're looking here at here is a very is a much more complex thing okay we're not asking so a normal index what does a normal index kind of do if we do an index on a string we're basically gonna um reply to all queries that check for rows where that column has that value but what we're asking here is get me back all rows where that column contains a certain value so what you're seeing here if if we were if our query checked if the array equals to some array then a normal index would work because just like you know just like similarly a string column equals a string but what we're asking for is a containment operation here and a containment operator operation requires a different kind of index so postgres has this idea of index methods or we can call them index types here to not use weird terminology you have various different types of indexes which you can define um i'm not going to go too much into this but generally when you have a complex thing like an array and you want to search for something inside that array then you use what's called a gin index you do it like this so this is now we're in postgres specific land we're telling postgres to create that index with the gen method gen is an acronym for generalized inverted index uh which basically means that the data is pointing from so we're inside the the array right we're in elements inside the array and the index maps it back to the array anyway i'm not going to go into the theory too much because it's not that interesting i'm going to run this again so now we're doing really pro specific things the index that gets created now has this weird bit here at the end called using gin okay this is this is the the weird stuff let's say now this thing blew away my database so i'm going to see that once again which is what i'm supposed to do so i'm building a different kind of index now which hopefully is going to be able to support support this query that we're running here and now we finally have a very different query okay now we have an index scan as opposed to a sequential scan yay yay right exactly to summarize just to make sure people understand we've stored an array into a column now which is a very very weird thing and we've also created an index which supports containment queries into that thing now i think that's pretty cool uh i'm not sure if people actually appreciate this this gives us a lot of you know what you can do with many to many but without all the uh you know the heavy normalization and joints so this is it's a sort of it can be seen as a perf feature if what you're looking for is perfect can be looking as a simplification thing if what you're looking for is simplifying your models and there's various aspects around this anyway i find this like an extremely cool kind of feature should i can i show just one more thing around this this cool array thing yeah cool online for time anyway i need to uh well we're 6 45 oh sorry that's uk time we're 45 minutes in that's uh okay yeah that's not so good but i'll run over by about 10 minutes i'm guessing so the one the one extra thing that i wanted people to see uh this can go even further this is like a very cool thing let's say we have multiple tags here that we wanna uh filter on so we have the tags foo and bar here and we wanna find or foo doesn't really matter and we want to find all blogs which have any of these tags okay this is like a much more um serious much more complicated query it's not just a simple containment thing so the first question is how we can express something like this in link we want all blocks where their tags array contains something that is also in this array that we have here okay so i was thinking about this at the time and i came up with something like this so we basically are going to ask if the tags has any tag which the tags from the outside contains so this is a kind of a complicated link query now right it's um i hope people kind of get the idea of what this is this is basically if you think in set theoretical terms this is an intersection right we're basically asking if the intersection between the rows tags which is the tags and the external parameterized tags if the intersection contains something if it's not empty okay if i run this thing this will actually work this is the amazing thing what you see here is something that's actually far simpler than the linq query this is not something that happens that often it has it so happens there's an operator for an intersection which returns a boolean so whether the intersection is actually empty or not and that's perfect so what the ef core postgres provider does is it identifies when it's translating linked to sql it identifies this pattern which is already quite a complex thing we call this pattern matching in the expression tree and when everything aligns correctly then it will emit what you want and this query also uses the gen index that we we defined earlier i'm not going to show this because i hope people trust me that i'm not lying but this is actually um a fast operation uh and it's quite a complex one as well so i think i personally think this is pretty cool are there any any questions up to now um there's been questions around uh you know whether or not it's actually faster than using a normalized database schema and things like that but i think you already covered a lot of that um yeah do a benchmark you will see every join if you do like real world data you're doing like if you're doing a many-to-many so two joints you're gonna that's gonna you're gonna feel it absolutely and then we had uh had this question about shouldn't it be named for npg sql has method or something awesome yeah so um i can answer that i guess uh yeah um you know when we when we were first working on ef core we uh we did have these long method names to avoid uh making uh cl having clashes between different providers um it makes it nasty for 99 of people who are using one provider or whether methods don't clash anyway so if you really have two methods name the same and you need to use them both in the same application you can switch to instead of using it as an attention method just using it as a regular method call which isn't quite as pretty in your code you can add your own extension method uh somewhere that wraps it and does your own thing but it's kind of one of these things where balancing it will like do we really want to make everything you do on sql server be for sql server do this and everything on post goes for postgres do this and we decided no that's just making it making it horrible for everybody so yeah we changed that um and then i guess uh going back to um what was being asked before was uh around jason collins and like could could this not be stored as a json column and then use the json query stuff to query into it right yeah so that's that's a good that's a good question i'm not sure i'm going to have a lot of time to talk about json unfortunately um you definitely can have a json document jason is something that came later so arrays were here and post was like a long long time ago and at some point jason appeared there's still a lot of value specifically in the race uh because they're more constrained and more restricted a lot of things uh many things are more are more um already optimized and done for them a lot of the things that you do here i i would say like this so mpg sql does support json mapping if what you want to do with the attention is to map an array use a postgres array there's many advantage in doing this if you want a hierarchy called data structure that's going to also be like um weekly type so you know key values and things and whatever sure use json but if it's really an array of ins or an array of strings like in our example like arrays are the better fit so we have a question there when can we expect rc one shot i'm moved by how much i mean so it's been a week and i've received so many uh requests already so i apologize to everybody that you know rc1 isn't out yet uh there's like one big change that everybody will see once it's out that's been you know been taking more time a few more days like really a few more days i promise this is like my highest top priority thing on my on my plate right now so uh going back to the json thing um so there's been questions about you know can can we do this on sql server and i think you know like is there equivalent to this the gen index on sql server i'm not aware of anything but if we if we think about the the json thing it may not be as specific and as optimized uh on uh as uh you know using the array type but you could use json on sql server or postgres or mysql unfortunately the postgres provider and the mysql provider have uh three f core have support for that json mapping json columns and querying some we haven't done that on sql server we hope to do it in 6-0 but it got bumped out um resourcing so um yeah once we get that support then across the board you should be able to do things like this with json columns which will let you run queries into the json and and that will be a pretty good uh more general solution for multiple providers great okay i'd like to show one more thing around arrays i'll try to do this as quickly as possible so that we do get to um to talk about json just a little bit i'm going to switch back to a sql server which means i need to have my uh you know my conversion and all that kind of stuff for a second and what i'm going to do is i'm going to flip it around so i've commented this out and what i'm going to do is a different kind of query i'm reminding you guys we have a name field as well that's by the way also required so what i'm going to do now is i'm going to do the exact opposite instead of checking blogs which have a certain tag what i'm going to do is we're going to query for all blocks where the name is one of a list so i'm gonna instantiate an array here uh blog one and blog two and that array needs to contain b name in order for the row to match so we're in sql server back back to sql server land again because i have something very important i want to show people hopefully i've done everything right and i don't get an exception now wow amazing yeah okay yay exactly so this is this is fine right everybody's used to this this is what we want the contains thing that the contains construct in um c-sharp and.net is translated to a concert called in i hope everybody's familiar with this code nothing to see here the interesting bit the important bit is what happens when we parameterize this array so if we don't want to check for the same two blocks every single time but we get them as a parameter say from the user log names let's call it then we're going to have an external thing here names not blogs names exactly and we're going to run this again on sql server and then we're going to see something very interesting so what you see here is the same thing the point here is that usually when you pass something in as a parameter like this from outside of the query efcor parameterizes it okay it sends it out of band it's not in the sql it's going to be a parameter that's sent out of band and the sql is going to be the same regardless of which parameter you sent uh this is important parameterization is important uh for in ef core at least for matters of perf because the moment you have the same sequel all the time then query plans are getting reused ef core internally caches various things it's very important for perf not to switch your sql all the time however in uh in the world of sql server since we don't have an array we don't have an idea of an array we can't send this blog names which you have here there's no way it doesn't mean anything to do this the only what what if core has to do what you of course forced to do is to do what's what we call internally constant expansion so we take that array and the moment we we need to execute it we're going to expand it into constants and inject it into the sql now once again that's it makes the query work as expected the results are correct but there's various perf problems around this and this is not a trivial thing people using uh this kind of pattern with a huge amount of say this array could have like a thousand things right if you're running this kind of query a lot you're basically bypassing your sql server query cache you're causing internal ef core caching to not work as efficiently as it could that's not a good thing uh now once again if we switch back now and go just just to interject though there are also additional uh complications around parameterizing uh variable length parameters so uh for example if you if you have a contains and sometimes it's got 10 elements sometimes it's got 15 something got 27. and every time you run it it has a different number then if you try and parameterize that on sql server you end up with basically different uh query caches uh different entries in the query cache for all the different numbers of parameters and that can be a problem for your perf as well so that whole area is pretty complicated we do some stuff now we have on the backlog to do some other stuff it's something that diego vega when he was on the ef team used to think about and come up with ideas for a lot um but it's it's a pretty complex area so anyway back to back to the postgres yeah unfortunately it seems so simple right you're just doing contains like in your linq query it's a world of pain behind behind the scenes really it's really very complicated to do however when you have a first class array concept in your database which is what postgres does and we run the exact same thing once again i hope everything is is correct i've successfully transitioned back to postgres we're going to see something very very different so the construct that you see here there's no longer an in uh where there's an any word and what you see here is a parameter since postgres has a first class idea of um of an array you can use it not just as a column you can also send an array parameter it doesn't matter what length it is exactly what arthur was referring to before you can send you know one query execution is going to have a length of two another query execution it's going to have a length of four postgres doesn't care about this this runs extremely efficiently compared to the sql server one because you have the same sequel every single time and that's once again a pretty pretty a pretty cool thing uh i'll just say one thing i'm going to jump ahead a little bit because i want to get to like at least a few words on jason this thing also runs efficiently if you put an index on name okay in this instance you're going to just use a normal a normal index just like any other index if you look at this if you think about it a little bit this corresponds to asking whether bname is equal to the first uh element of blog names or if it's equal to the second element of block name so you can expand this logically speaking in your mind to a bunch of ores and a bunch of ores can be of course optimized optimized by using an index so this this thing is also again very very uh optimized but notice that the translation here is very different from what we saw before we don't use the array containment operator like before we could have used it you know that's quickly with the uh at greater than sign it's an alternative translation but each one is going to use uh indexes differently depending on whether this is an uh just the string or whether this is an actual column which is an array we're looking into so the provider is actually using a lot of smarts when it's doing its translation to choose which construct and sql to use so that you can use your indexes this is basically my point here translation here depends on whether your array is a parameter or a column i think that's a pretty uh uh pretty weird point we don't have a lot of these cases in um in an effort we do have some of them but this kind of you know different expression based on parameter or constant or column i find pretty cool and it's nice that efcor can do this again one of the advantages of using an orm i mean you often hear people talking in the disadvantages of using rm because you can write more efficient sql but at least i know that i don't know all these things i don't know when i need to do this translation versus that translation versus the other you know and so having a provider that's actually able to figure that out for you you can end up getting more efficient sql out of n than you would write by hand unless you know everything in the world about postgres and sql and then fair enough but uh i think for the average developer it's one of those big advantages of using an orm i agree i tend to agree so usually you should be looking at your sql and you should be understanding what's going on as a developer if you care about perf but it's true that in this in this sort of instance this is pretty pretty persuasive kind of demonstration why an orm can actually sometimes be a little bit smarter than you in terms of crafting your sql and avoid certain pitfalls and problems even in a provider-specific way so it's going to do the specific crafting for the specific provider for this specific database which is i think a pretty a pretty cool thing okay i'm going to jump to the to the thing i've been promising we do have let's say about 10 minutes so i at least i'm going to do like a demonstration um a minimal kind of demonstration so postgres also has um json support all databases nowadays have json support it's part of this move to bring in this kind of nosql um mentality or philosophy into back into the relational databases and it's a very cool idea we no longer have to choose either you know the traditional relational world or uh you know pure nosql document database like or cosmos or whatever we can mix and match a little bit which which is obviously you know the right way way forward so you can model certain things in a relational way where you have a very strict schema and benefit from that thing and where you need a little bit of you know fuzziness and documents and contain things rather than you know normalize them out into a many-to-many relationship then you can use a more no-sql kind of thing in fact what we just saw was a nosql kind of thing right what we did with the arrays was a nosql thing but nowadays it's all about json right that's like the cool thing like for for a very long time post quiz actually has uh two json types not even just one it has two types sql server by the way doesn't have a json type it has a set of functions and operators which can be used to manipulate json data in text columns which is a different kind of approach in postgres the primary kind of way to do json is called json b or json binary that's basically just a type that contains uh json in your database but crucially it actually it doesn't contain it as a text so you don't just dump adjacent text into your database what it does is it parses it and has some sort of highly efficient binary representation for the hierarchy of the json document and that allows us to do various things in a more efficient way for example if you want to project something out of a json document okay let's say uh i have a json document now and i want to project you know some subtree of that thing if you do this on sql server or on postgres with the older kind of type then put your database basically has to parse your document each time in order to extract that thing that you want to project out whereas when you have something that's already parsed and you know in an efficient binary kind of structure then you don't need to do that kind of thing it's already parsed on disk in a way right but again let's jump directly into the code as quickly as possible let's say that i want to you know have a json document here so the simplest way that i can do this in postgres is i have a string property here this is not the cool thing what i'm showing you i'm gonna go go to the cool thing very quickly and all i have to do is say right name equals json b now this is a completely generic ef core mechanism has nothing to do with ef core with postgres sorry what it does it definitely has to do with ef core a little bit what it says is this string property by default we would map to a text column in json in in postgres right what i'm instructing ef core to do now is to map it to a json b column instead so if i create this database now x tags x where did i leave some tags no more tags or adjacent now so if i create this now what you're seeing here is that instead of text which is what name is this is a json b document at this point this is not amazing what it does do it once again is it it parses it so that the value is stored in an efficient way it also validates so you can't insert invalid json into this which is once again an improvement right if it's just if you're just shoving json documents into text columns you have no guarantee that what you have in there is correct but the more interesting thing that we usually want to do now is to actually map a poco so a plain old c sharp object into this kind of thing so what i'm gonna do is let's say i put some class and i put whatever some in property views on it and now i'm gonna just switch type to be details so if i run this we're going to see the exact same thing this is a little bit weird because what we've done now is we've created a c-sharp type right a dotnet type and we've created a property with that type and we've mapped that to the to the database and what we see here is this okay let's see that thing into the database now so we're going to do details equals new once again i'm a big fan of target typing views equals eight so we're now saving [Music] this thing into the database all right so we have this insert thing right here and if you look at what we're actually seeing inside into details this is an object here okay this is not a string this is let's say a json object at this point if i switch to my um let's zero this thing out if i look what i have in my blogs table here and what you see is once again something that looks like a json document but you'll have to believe me behind the scenes is not a json document as is it's a binary representation of a json document and it's inside one single cell so one column one row i hope that that's kind of clear the interesting thing of course is when we want to start querying into this kind of thing so what i can do now is ctx blocks now if i want to start drilling down into this thing i can do this so what i'm let me do this after i see my data that's going to be a much more intelligent approach so what i'm doing here is i'm doing a query and i'm asking ef core to get me back all the rows where the details column which is a json b document once again has a views key whose value is eight this all looks like plain old.net we're still doing just link it just looks like a normal kind of thing but this you know usually when you see this kind of thing in ef core this means that we're traversing a relationship right there's a related entity and we're going to do a join or maybe a split query right whereas here we're in nosql mode okay the document is inside that row which you know which is the blog it's in there just like in cosmos or mango or whatever and if i run this thing this is actually gonna work and you're gonna see some weird stuff right this thing i promised you operators right weird operators here here's i've delivered on my promise this is a weird operator which is a drill down into an element inside a json document okay this is obviously it's obviously recursive i could drill down into a lot more i'm going to do maybe a very quick just to persuade you guys that this is extremely cool let's say that this thing has uh let's create a class of orders and on this is going to be like a price none of this makes any sense in terms of a model but we don't care so we have orders here okay and now let's say we want to i don't care about the seating break this down and i'm going to say where instead of use equals eight orders dot length equals eight okay where there's eight orders on that details thing so if you look at this the provider knows how to translate this as well or it doesn't [Laughter] cannot get the length the array length of a scalar what have i done wrong is orders huh where's the scaler here interesting it worked before it did work before you'll have to believe me on this okay we're a bit out of time so i'm not gonna you'll have to take my word for it um that i'm not lying there's a there's a variety of you know operators and functions which you know the provider knows how to translate uh there's an indexing uh story here as well so you can create indexes once again they're going to be gene indexes because they're from the components inside the json document outside their inverted indexes as we call them in postmaster there's a certain set of scenarios where you can actually make this thing very fast so use an index exactly like you saw before with the arrays it's a pretty awesome thing the thing where this where this becomes limited uh this this was done as part of the postgres provider as uh arthur said and also mice in the mysql provider you have very similar support it's it's done in a way that for example doesn't support partial updates for example we can't uh update a json document and replace one part of it for example using efcor you can't do this efficiently you can replace the entire document but not a partial update we plan to look into this kind of thing in for e of course seven i really hope so the the aim is also to bring all this goodness to sql server of course with with whatever you know support is there we'd have to of course look at you know what kind of json support is compatible across all databases and provide some sort of lowest common denominator which hopefully will be substantial enough so that it's interesting uh but that that is a very exciting thing i personally believe this is one of the major things that ef core can offer which is to query on json documents inside the database once again using pure plain link queries like you're used to using c-sharp like in general is your value converter still on there from sql server you're probably right well done no it's not no it's not i deleted it but this worked i promise all right well you don't have uh i don't i didn't have two two major catastrophes i had just one major catastrophe so but enjoy the cat yes alice says uh i don't often use databases but when i do i use postgres postgres the database alice uses yes exactly um yeah so we're over time now i think we've answered a few questions there's been people in the chat answering each other as well which is always great to see um can't answer the question when you should use sql server and when you use postgres this that's always a very complicated uh answer um use what works for you you know what what makes sense for you basically um but uh if you do ef core is a good choice for both of them so hopefully you'll use the f core either way um so uh with that unless uh unless your price you have anything else i think we'll uh we'll call this one and uh thanks shai very much for uh doing all this postgres stuff i'm sure there's a lot more that we could have shown there but it was really it was really good digging down into some of those uh detailed stuff yeah maybe we'll have to do another show in a year or so yeah okay okay thanks everyone and uh i have to find the right buttons to press to uh end the show so let me uh let me see with alice on it with alice yes here we go okay thanks for watching everyone and we'll see you next time [Music] you
Info
Channel: dotNET
Views: 7,962
Rating: undefined out of 5
Keywords:
Id: Ya_cmZRwACM
Channel Id: undefined
Length: 72min 31sec (4351 seconds)
Published: Wed Sep 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.