.NET Data Community Standup - EF Core internals: IQueryable, LINQ and the EF Core query pipeline

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] Entity framework framework hot chocolate azure as your Cosmos um [Music] countdown commencing B minus two minutes [Music] [Music] [Music] breast cancer worlds [Music] dot net data T minus one minute [Music] [Music] thank you [Music] 30 seconds [Music] [Music] [Music] hi here we are again uh we're back with another.net data community stand up uh shy and myself here from the.net data access team and um shy is basically going to talk to us about how Link Works language integrated query uh in.net C sharp and we're going to use EF core as the basis for that um because that's you know the primary link provider that we work on but some of it will be relevant to any link provider um some of it will probably get quite complicated um but hopefully there's enough there that you could give give people an idea of the the understanding of the you know the basic concepts which um which are really interesting to learn and not always immediately obvious when when you start using it so I'm really looking forward to this um and hopefully you guys are too um no uh no state of the Unicorn this week because I was looking through and there's not much changed from last week um I will say uh EF core 8 preview 1 and net eight preview one are both out now so you can go try out new features on those and we have some getting started docs on that as well um well watch new docs more than getting started they're more than getting started anyway uh and then also yeah of course 703 is out now um and that has the majority of the patches that we've done on seven zero so that's a really important patch release um 704 should be coming out next week um but that's um fewer patches and seven zero three seven zero three was the one where we were able to get the most stuff in after the initial release um and then of course there's always a daily builds which give you the most up-to-date stuff um it's always well worth trying stuff out in the daily builds when we do bug fixes we make sure they go in there so you can use those uh even if it's only temporarily while you wait for it to come out into some of the release enough of that let's uh let's go talk about link shy okay so uh here let me let me add this slide thing um this is going to actually have like a PowerPoint uh presentation as well which is something we never do but it still will be interesting the fact that there's a presentation does not mean this will not be interesting uh that's just the thing I wanted to say uh so yes as Arthur said that this week uh the idea I've been actually wanting to do this for a very long time the idea is to do a deep dive into uh how link works and specifically discuss some aspects of how EF core works when it processes your queries right so this is definitely um you know how the sausages Mage kind of you know session this is not something that's immediately useful for you we're not going to discuss a cool new feature that you can use in your programs that's not it but I do believe there's definitely some aspects which are gonna help you there's hopefully going to be some aha moments there where you you're gonna you know understand oh so this is how it works and it's actually going to help you write you know better queries or at least understand how to avoid certain mistakes so I really I really think I believe that this is useful to anybody using you know a link provider such as EF core um but you know we'll see um I'm looking forward to getting some some feedback so um ask your questions um we will take some of them you know when they come in others um there's a thing in the Stream yard that we use where I can mark them and then we'll come back to them later so if we haven't like answered your question right away don't worry we'll come back to questions later but um yeah sure ask them in the chat exactly I'll try hard to uh to make some like uh some stop points in the middle and we'll we'll handle some questions so definitely write them um and we'll try try our best so first I want to do some a little bit of you know this ambiguation so what do we mean when we say link so link is a very famous c-sharp thing language integrated query usually when people talk about Link in the world of c-sharp they're referring to uh to this very very simple thing right let's say you have an array of blogs or anything else it doesn't matter link is a super cool thing where you can use functional style operators like a very functional approach towards um to to um to do certain uh you know sequence operations or whatever you want to call it so you have a basic you have a set a catalog of operators like where which filters and you've got order by which which sorts um we call this functional programming because these operators typically get another function as their parameter they get lambdas so it's a functional kind of approach we're passing around functions and so on our starting point here is blocks we can call this our root the root of beta and then we you know continuously compose more and more operators and in the end there's some terminating thing operator like two array that's going to cause this whole expression to actually get evaluated so one interesting part of of link as I'm discussing it now is when I do this where nothing actually happens right it's a lazy evaluation technology when I do this nothing happens it's just it just composes the operation and then this composes another operation on top of it and when we evaluate it then on and only then does the entire thing get evaluated and this has all kinds of interesting properties um of course I don't even need to run this I hope everybody knows enough about link um this this is basically the way it works now the important thing is to realize that these operators that we see as you can see in my documentation here they're over something called an I enumerable so I enumerable is the abstraction and net to refer to something you can enumerate over uh what does that mean it could mean anything it means the data is not even necessarily in memory like as an array it could be like coming from some place in the network we might not have the entire stream in memory at the same time again it's very very lazy um right but the important thing is that these operators actually do operations in.net now I'm going to do something maybe a little bit ambitious I'm gonna F12 into this I'm gonna go into the sources of what where is just so that you because we're going to contrast this with with what happens later error now it doesn't exactly matter how this is actually implemented you see that internally in.net we're now inside the.net you know PCL inside the.net sources there's an aware enumerable iterator and if I look into it I'll see that at the end of the day there's going to be something that um goes over the data here like enumerates over the data right there's an enumerator underneath this that that we get out of this enumerable and it's gonna activate it's gonna call the predicate this is the function that Lambda that we passed it we're going to pass it each item and if that thing returns bull then we return it otherwise we just continue moving to the next thing so this is how where is implemented it actually looks at the data one by one calls your function returns it if it returns true and doesn't return if and moves on to the next thing if it returns false makes total sense it's totally you know it's totally um you know what you would expect nothing to see here move along now when you move to EF core you have something now we're we're moving to another aspect of Link which I think is much less clear to most people right everybody knows what I just said now in if core I have a block context I create an I see that database I'm not even going to run this query so much so it doesn't really matter um Let me let me make this exactly as it as the other one is just for cement symmetry sake or whatever we have something that looks very much like the above there's the same wear and there's the same order by the starting point is different this is very crucial right so the starting point is no longer an array it's what we call a DB set so e of core defines something called a DB set we can call this our query root this is the thing that represents the the data that I'm you know going to apply my operators on so this is the query root a bit like how blogs here was my you know the root of the the iron marble thing but if I look over where then something else shows up in my documentation this is no longer I enumerable where it's actually a queryable wear right so now we're in we have a completely new concept which I don't think most people here are are actually aware of even though you use it every single day with ef core I queryable is is um you could think about it as something similar to ionormal um what it means is it's not a stream of data that we can just enumerate now what it is is basically an abstraction over something we can query over so it's a thing over which we can apply stuff like where order by sorting and all those base Concepts but if core doesn't take any data and actually oh I'm kind of stuck here it doesn't actually take data from the database and apply The Operators right the whole point here we don't want to take the blogs in the database which is let's say a database table we don't want to download all of that and then apply theware here in.net via the code that we just saw right in in the.net in the.net sources right that would be horrible for performance reasons what we want is for this whole thing to get translated into SQL and send to the server and then the server will evaluate this whole thing which we express here via these operators and send us back the results I hope that's clear that's not a trivial thing what I just said right um yeah that is the critical thing that you really that's one of the AHA moments that I mean for me as many years ago now but I remember having that aha moment like oh this isn't getting executed I'm not writing code here that is gonna just get executed it can write a SQL query for me exactly exactly so I want to dive into this and see what that actually means now right so once again keep in mind that this thing the whole point here is to have it translated into SQL and center the database for evaluation of these operators somewhere else so in a server this is the whole point of query I query mode basically okay now um you know what I'm going to do something very simple here I'm going to place two breakpoints just to drive the point home I'm going to remove the two array at the end so in both cases nothing is going to happen both both constructs are very lazy right um this one doesn't evaluate and this one doesn't evaluate or translate or anything anything so I'm going to do an F5 I hope my I hope that the the debugger will be visible enough because I don't have a lot of real estate here on my screen but let's see what happens so this is a building running okay um uh okay yeah you know what before this I'll do a VAR ICS because I want to throw away the result there yeah yeah exactly exactly don't want to throw that away I want to actually use the debugger to see what is the thing that we are constructing here right without actually evaluating it so let's see what that is oops there we go so I'm gonna do do uh you know step over so now X contains my thing note again this has not been evaluated I've not actually applied the where and the order by right because this is lazy evaluation if we look at X at that point and I have something which is an ordered enumerable okay if I look into it um you know it's going to be an innumerable composed over another numerable and that's eventually going to be composed over that blogs thing but these are innumerable classes they stay enumerable and at any point I can call get a numerator on those innumerable and start getting the results out of that thing and at that point the where operator will start kicking in applying its filter and the order by will kick in doing its ordering and so on now I want to do the same thing uh oh but I I actually am missing something here if I want to put a breakpoint I'm just going to put whatever something like this and I'll put the breakpoint here and I'll remove this one I want to show you what is actually there um you know that how the sausages made once again so if I look maybe I'm going to need a bit more space for the debugger here so I'm gonna bring this up here so once again the results here is this thing which if you look at it you will see that this is a queryable now now this is where it starts to be interesting there's a whole lot of stuff here which we're going to go into in a few seconds not too deeply but the one I want to concentrate on is this thing this queryable thing that you see contains something called an expression this is where the fun starts what you see here is we have something called a method call expression and if I go into this I'm going to drill down to this you'll see that inside that method call expression is a method and that method is order by so this represents the calling of a method called order by it is not the method has not been called nothing has been called here what has been done here is we have something that represents the calling of order buying if I continue looking inside this so this is the method and this is the arguments to that method if I expand this there are two things there the second thing is a Lambda B uh B2B name remember this is this thing that you see here you now see it in here it's suddenly been you know copied into this magically copied into the structure somehow the first thing the first argument is the thing to the left of the order by because it's an extension method so we have a method call expression again we just had saw a method called expression now we have a new one but this one is for the where so what I'm what I'm trying to tell you here is that this expression thing that you see is basically a representation of this whole thing okay now anybody who's worked with any sort of you know if you've studied compiler at some point in University or worked with with with syntax and all that kind of stuff you're aware this is not super Advanced it's pretty basic this is what we call an expression tree or sometimes an abstract syntax Tree in ASD I'm gonna I'm gonna help you guys out and I want to show this on a slide yeah go ahead Arthur so I want to I want to just push back a little bit on that like if you're a compiler person and you understand compilers then this stuff is pretty basic okay most of us are not compiler people and expression stuffs is actually pretty complicated and and I also want to I want to bring in a couple of the the comments uh here um because I think it's this is really relevant to what you've just been saying so this this is from before you just before you just did that debugging session uh and saying well it doesn't that doesn't explain why we have accurable because they're both lazy but as you were showing in the I innumerable case what you've built up is some is code that will then get executed when you do two or eight it's actually it's actual code it's just you know you it gets run in that I querable case we haven't built a code that's get run we've just built up a representation of the code that you write an expression tree um like you just saw which shows the code that you write it's not going to be run as you're going to go into we're gonna we're gonna basically translate that we're going to look at it like like a compiler would in a sense um but it's not going to be run and that's the they're both lazy but that's the difference exactly that iquerable and an iron numeral and and why we have the iron numerable exactly we had a team member who used to refer to queryable as double lazy in fact um I'm not sure I mean it's it's a way to look at it you can look at it this way right um it's it's it's lazy first of all and the innumerable sense it's lazy in the sense that you don't uh we don't invoke these operators as we're calling them we invoke them later it's lazy in that sense but queryable goes a step further and doesn't invoke them at all in the client so to speak right they're never invoked as operators in the client yeah what we get in the end out of this whole thing is what you see here on my slide I did do some PowerPoint just so I could I could visualize this for you guys so um um just quick can we can we move that message off the screen uh won the stream yard is showing of course of course yeah and then people are asking about expressions and saying the how the documentation is not there we're going to go a lot more into Expressions as we go through this but we'll try and convert back to that we are we are but I do want to say one thing about this um now let me come back here you know what let's stop this debugging session it's not that important the nice thing about the the way link has done is that this this thing here actually constructs an expression tree for you now you can construct an expression to yourself there's an API for this you would do expression dot call and this is a factory method for producing exactly that method called expression that you just saw before but um uh that would be let's call it dynamically or explicitly constructing the expression trees whereas the amazing part of link is that the expression tree gets implicitly constructed for you by the compiler and that's the magical thing you don't have to know anything in order to run a link to write a link query you're using the exact same code and c-sharp that you're using for the enumerable version you're going to use that for the variable version and behind the scene that's going to construct that that expression tree for you so and in fact in fact one of the things that we see is people trying to use that expression API when they really don't need to and it is pretty complicated to build the correct expression with the expression API because the pilot is much more much better at it than we are so if you can get that pilot to do it which is what Shay was just showing write the code in the compiler builds the expression tree for you always go with that that's that's going to give you the best results and it's going to be the most readable and everything basically absolutely I actually like the uh there's a comment there let me uh maybe I can bring it up because as long as we're doing a small break this is this is a great question so what is the point of equivalent aside from EF so first of all link is very old um this is not like uh something that has that was introduced two years ago this has been back in the.net frame days I don't even know exactly when but then maybe.net uh four or something I don't know uh something like that five three five three or three five yeah so I do agree first of all that link the best case is for you know translating to SQL or you know what database screen is is an amazing use case for this and I'm sure that that's what the people had in mind when they designed it uh EF is not the only uh database provider there so there's at least three other even four you know and hibernate is an orm that has a link provider if I remember correctly a little BL gen has a link provider that was linked to SQL which is was a link provider so you have various other implementations of Link providers which translate and express entry to SQL but there were also some other examples in theory in any case where you want to express something like this you could use a link provider for example you could have a link provider over over an XML structure for example where you can have a link provider one one interesting idea that we had at some point you could in theory have an in-memory structure and you could build a link provider over that which constructs indexes behind the scenes right like to to make sure that the where so when you do a wear with with a numerable the what that word does it's it's dumb right it's going to go over your entire array and it's going to check each one if you wanted to do this more efficiently you could have like a whole system behind the scenes where you construct an index in memory like a small database in memory and then the I queryable where would be translated to looking at the index rather than going through your full you know data structure and memory like a full table scan basically right so I'm trying to say this is um it's an abstract concept that can be used applied to many different problems but it's true that in practice I'm not aware of you know a big successful use case except for translation to SQL yeah so when it when it was released I remember linked to XML was one of the the I mean the three that I remember and I'm not saying that they weren't more but the three that I remember from release were linked to object which is another name for the iron Universal thing that we looked at so it doesn't really use the aquarable link to SQL which um was you know we could go into the whole history of that but it was basically an implementation of the same thing that EF is and so EF was kind of competing product from the same company um but those were both too SQL and then linked to XML where at that time no this was before Jason and before yaml and before also everything was in XML and the idea there is you you do where whatever and then it goes into your XML document and does does the right thing there and I actually don't even know how that was implemented whether it uses I mean I don't know but you know I think it's fair to say that for now as of now it's mostly useful for translating to databases and of course though the whole link to objects thing with iron numerals is oh is very very useful and people talk about the perf and there can be some perf issues although I know there's ideas to try you know try and improve some of that and there's allocation issues in some places but in terms of just writing high quality readable code especially in non-perf sensitive paths it's a very very effective very powerful feature that I don't think we should uh underestimate but not what we're particularly interested in today there is one one uh one other example that's not SQL and that's graphql so we've actually been discussing this a lot and I think there's at least one if not two link providers for graphql which means you if this is what you want not everybody wants this kind of thing but you can express use Link exactly like with ef to express some sort of query and that would get translated to a graphql um query and sent you know whatever via Json to some webs to some graphql server over HTTP so that's another example that the the concept is actually very powerful and applicable to various various things um yeah anyway I think we talked enough about this yes let's move on we'll do some more questions a bit later yes so I just want to make sure that this is very very firmly in people's heads now right for this query that we've been looking at blogs where this order by that the the representation of this in a syntax tree or in an expression tree is going to be something like this it's oh it's if you go from the the so order buy is called out on the results of where right so you go from top to bottom or from the end to the beginning when you want to look at this so the for the root of this tree is this order by Method call right it has two arguments to it right remember this is an extension method so everything that comes before is basically just the first argument to that to that method right um so the the second the second this part here is actually easier right this is the Lambda and this is not the exact uh expression tree I simplified it a little bit the actual expression to you is unfortunately a bit more complicated but there's there's the Lambda and inside the Lambda the Lambda body is a member axis expression that's the B dot name and that has a b and the name and on the left side of order by you have where okay so that's going to be the first argument to order by is another expression this is a very recursive structure on the left side of that the left side of blocks we have the thing representing the blogs thing the table in your into your database right this is the query root or the DB set in EF core um so conceptually yeah that should that should be familiar to people on the right side um the the Lambda has a body which has a greater than expression that's a binary expression because it has two operands um and that thing has uh on the right side a constant of one on the left side Our member axis bid which decomposes to be an ID so I went through this exercise to you know give people a feel for what this looks like um like like I said before um anybody who's dealt with languages in in programming uh compiler people um notably um this is the bread and butter of you know this is the way it works absolutely everywhere so at some point when you're when you're writing c-sharp for example and the compiler has to compile it it's going to parse your c-sharp code into something like this it's not going to be this it's going to be different but the concepts are going to be the same it's going to be this this tree structure with nodes and you know nodes containing other nodes and you're going to have a binary expression and you're going to have a method call and anything that you can write represent in c-sharp code is eventually going to be parsed into this kind of tree structure and the compiler actually operates on this thing just like EF core operates on this Expression 3 that you see in fact you should or you can and should probably think of e of core as a sort of a compiler we think of ourselves in that way and we use that word a lot um right so so anyway that's that's kind of that's kind of what I wanted to say um um let me just check my notes here because I had various various other things that I wanted to say right I did wanna show uh one one more thing in the code before we move on so just like I went I we stepped into the implementation of the I know but where right we actually looked into what where it looks like under the hood so I'm going to do the same thing for the query it will wear so I'm going to do F12 and we're going to jump to a completely different place because this is a different method it's a queryable we're not an innumerable wear so if I look at this this is going to seem a little bit weird first of all it's very short right it's not it's not a long thing as you can see it also has nothing to do with what you saw before which isn't itself kind of interesting what you see here is we I'm going to go through this quite quickly and then we're actually going to see a minimal link provider just for fun so you see we have the the iquerable this this argument here right and then so that's the that's the first argument and the second argument is the predicate which is basically that Lambda point I'll draw your attention to the fact that this is not a function it's an expression over a function so this actually represents an expression Tree in itself which is that that predicate that b Arrow B dot name is actually an expression tree a fragment of the expression tree that we're now composing on top of whatever came before right I think that's a that's a very important thing to understand that even though often when you write the code it doesn't look any different right in the new in the innumerable case where we weren't passing expression tree it didn't look any different but the fact that that signature that you're passing in says I'm taking an expression means and that's why sometimes you can do something that looks like a reasonable refactoring in in the language it like just extract this into say a method call and it stops working in the same way because you stop now creating the expression tree and you start just saying call this method and and it becomes a black box evaluate this right so again if we're back to the innumerable where you can see that this parameter is a func of t-source to Bool I hope everybody's familiar with this a funk is basically just a delegate or a function that we're passing around here it gets a t-source as um as a uh as an argument or as a parameter and it returns bull weather to return it or not right we're looking at where the same thing if we look at the queryable it looks a little bit the same you see the same Funk here but it's also surrounded by an expression and that's a signal to the compiler to not pass in the a function to not have actually a function here but rather a representation of that function in an as an expression tree okay now what actually happens here um what what actually happens so we go to the source which is the queryable there's something called a provider that's our queryable provider we'll see this in in a second and we call create query on it so we create a new query over it a new query able to be to be exact and what this does is it uses that API that we briefly mentioned before expression call so this creates a new node in an expression tree and crucially what it does is it passes it the old node as an argument so it takes the old node before this where is being composed um the blocks whatever it used to be it takes it out and then it wraps it in a in a new call node okay and what it adds obviously now there's this new call node and you can see that there is this thing there's a quote around the predicate which is what we added here so once again I want people to be crystal clear about what's Happening Here what we just did is we constructed this thing here we the expression was this right that was the queryable that we had before we took that out we constructed a method called around it we also put the predicate which is this thing this whole thing is is what what is now the predicate and we return a queryable over this node in the expression tree right so what I'm trying to say to you to summarize this is is not that complicated the queryable version of where what it does is basically create compose an expression tree we're constantly adding notes on top of each other every time that we add another operator in link we're just adding another node one on top of the other that's all we're doing nothing ever gets evaluated here this predicate here never gets invoked nothing of the sort happens we're only building up an expression tree that eventually will be translated into something else okay um is it are we do we have any special um questions around this the stuff that I'm showing here um I think we should just keep going uh general questions but they'd be better let's let's get through some more of it first okay I'm gonna do something a bit um ambitious at this point and I'm gonna paste a minimal link provider now this is not going to translate stuff to SQL right um I hope people people are clear on this this is a very very very proof of concept like skeleton of what um what what a link provider might look like so I'm just pasting this and I'm going to show you guys what it looks like this is just to get get you to have a sort a sort of a feel um around this I'm constantly switching between Windows here so this is why it looks a bit a bit weird so um let's let's concentrate on this thing here first of all I have something called a my variable here okay and that implements the iqueryable that you saw so this is the interface coming from.net and this is a micro microwave over T element so this is something that we can query and it can return any type of element conceptually right you can think of this a bit like a database table again we can go to the database and query this table you know do where select order by all that kind of stuff um I hope I'll be able to explain this in a way that's not too complicated a queryable has an expression which is what we saw it's it's all very very simple right all this is is an expression that represents the expression that that will that needs to be translated right it has the element type which is what we're querying over at the end of the day but that's not super super important and crucially it also has this query provider okay so there's a provider here as well that the queryable is wrapping so behind our queryable you could you could think about you can think about the scribble that something belong to e of core to your specific link provider and behind the scenes it has a provider which which does various stuff for example if I now ask the um the the queryable to enum to get an enumerator from it that is basically telling the thing to get executed right when I come to this to this thing to this agreeable thing and I'm saying please enumerate this what I've actually just asked is please execute this query and return the results and stream them back to me right this is a very very crucial point right at the end of the this is what happens when you call two lists or two already right precisely true list says says okay I'm gonna create a list and then I'm going to enumerate the thing so it calls to get a numerator iterates over it puts all the stuff in the list and returns it to you so it's like this is how you execute it basically exactly so uh exactly like Arthur just said and that delegates to the provider so we're going to go down to the my query provider which is my very minimal you know provider for for our link implementation and there's basically two methods here there there's really nothing more the first thing is create query which you saw in the other um do I have do I still have this open no let me go back to the um to the where implementation maybe we should have this nearby right so you saw that the the where operator What It Did is it went to my queryable extracted the provider out and called the create query method so if I look at this create query method it's all very simple it returns a new my variable over that expression that's given to it this is basically a hook or an extension point in case the provider wants to do any sort of manipulation doesn't really matter but what we're doing in this phase is basically just like composing those nodes those expression nodes one over the other what gets interesting is execute which as we saw gets invoked when we enumerate this thing right this is this is where the magic happens and here obviously I'm not going to actually do anything because I'm not going to write a link provider here that's a very complicated task but what I will do is I'm going to do just a bit of debugging here so I'm going to do console right line executing query the topmost expression so I'm going to look at the topmost expression in my expression free now remember it's an arbitrarily deep expression tree it's very recursive but I'm going to look at the thing at the top and I'm going to write out its name and then if it's if it happens to be a method called expression which it is in my example then I'm also going to tell you what the name of the method is is so this is basically something and here is the to do here's where we're actually supposed to translate the SQL right which is a small task obviously but I didn't I'm obviously being facetious and then it doesn't actually return anything um uh let me get to that in a second I just want to run this and then we'll we'll address that question okay let me uh I just wanna just wanna run this so I commented everything else unless I've somehow done some sort of error oh and the code that actually runs this I need to show you so I'm constructing a microwave this is like our DB set um I'm passing an array which is the thing which theoretically we'd be querying over but of course we're not really doing anything here then we're going to compose aware right a queryable version of where with a with a predicate and then we're going to call first our default on it so in theory what we're expressing here is that we want to get back the first thing that is bigger than one but of course my link provider is not not actually implemented so I'm going to run this let's hope there's no no reference exception uh of course there is something yeah I think yeah I left this thing here so I'm going to take it out back up here okay so what this actually worked right so what this link provider is doing it's telling you stuff about the expression tree that you've constructed and handed to it and what it's telling you is that the topmost expression is a method call expression and the the and the method call is for the method first or default so this tree has two method called Expressions composed on top of each other first or default on top of where on top of this constant expression here okay this is basically what what is happening here the the link provider doesn't do anything with it it just outputs some debugging stuff but you could in theory now start implementing this thing and do whatever you want with that expression tree it could interpret it translate it in any arbitrary way do something with it over SQL or over anything else it doesn't matter that's that's your implementation of your link provider which is quite a quite a a phenomenal thing I think like it's an it's a design that's very very interesting and by the way I'm not really aware of something like this in most other languages so link this thing which you just saw the queryable implementation of link is quite unique as far as I know in other in other languages of course it does exist innumerable operators Exist by now in every language so every language allows you to write you know um where this order by this the details vary but this exists everywhere whereas this thing where you use the same operators to express SQL that's something that is very unique to c-sharp and I think very very cool it's a very strong point of C sharp when people ask me you know what is interesting about c-sharp for database access then link is my immediate response this is like a very unique feature and if you've only done EF core and c-sharp you've never done database programming in other languages then you don't know how lucky you are in my personal opinion I think I think link is really really awesome in that sense it allows you to use these very basic building blocks where in order by which everybody knows and uses on memory it allows you to apply them to express SQL queries which is like an amazing thing and you just saw more or less how that how that works behind the scenes okay maybe let's bring that question back that you um did you have right yeah sure um I think I mean that's basically what you've just been talking about right um is that it uh I don't know that's the other one that was the first question yeah so yes interpreting the expression so you know this is very this is a complicated bit right this this is this is the bit which we're probably not going to go into in detail sorry I wasn't I wasn't I was answering some questions so maybe you said this already but this is where you use uh a visitor pattern and you have to interpret things and this is this is most of the work that you know submit used to do in the query Pipeline and believe me it's it's not easy really it's really difficult stuff yeah I will say this I'm going to show some some bit of magic here that if somebody wants to play around they can they can at least start and by the way I think we're going to do another session because this is not enough time to you know there's so much to talk about we'll probably do another session where we'll talk about how you visit uh you know expression trees and you know techniques for actually looking at expression trees and changing them and all that stuff we'll talk about this but I just want to show one bit of magic here okay um what I'm gonna do maybe this will blow some people's minds but I'm going to do the following thing I'm gonna write um I'm gonna write something like this okay I just wrote this code here um and then I'm gonna do like before I'm gonna do something like this this is actually not new um compared to what we've already seen so what you see here there's no functions there's no link providers this is this is super simple I'm showing off a capability of the um of Roslyn of the c-sharp compiler now basically what you see here you know what I'll I'll do I'll start start out simple I hope people are aware that you can do this let's let's start let's start easy right this is a very simple social construct where I'm defining our delegate okay like a function let's say that accepts an INT and returns a rule just like for where right at this point I can do console right line predicate and then I can put parentheses and do eight so I'm invoking that function which I just created up there and that's going to write obviously um um yeah that's going to evaluate that's going to do mod 2 it equals zero so yeah it's going to give me uh true right I hope this is clear to everybody I can run this it's not really that that interesting I add expression here like we saw before when we looked at the function that's assigned to the uh um to the compiler that what I'm interested is not in compiling this code but rather constructing an expression tree out of it so if I run this and I look at what predicate is I'm going to find an expression once again right like that's that's that's the whole point of you know what we've been talking about I just want to show this in debugger so if I look at predicate now this is now this thing is now an expression and you can see that there's a body with a logical binary expression that's the equality check and if we you know go into this there's a left and the right and the left is another binary which is the I mod two so this is a way if somebody wants to play around with expression trees and see see um uh you know for a given C sharp code what kind of expression it translates into then this is a nice way to do it this is what I do sometimes like if I'm not sure I have like some C sharp code and I want to see what it's going to look like as an expression tree this is what I do and then I look at it with the debugger I see what the compiler has constructed for me as an expression tree and that's my way of kind of understanding how it is absolutely that's what I do too yeah exactly after you've done this for a while you kind of know you know what it looks like it's not it's it's not the end of the world but even you know we do we do Expressions every day in the team it's still it's still tough it's still like hard to keep this in your head all the time is do we have any other interesting questions uh we have quite a lot of interesting questions but um not specifically necessarily on this point I mean we can just go answer some questions if you want um and then come back I'll tell you what like my next thing is is starting to talk about what an actual link provider looks like which is the if core uh yeah so if we want to still talk about like conceptual link questions then maybe we'll go through here um there's I think most of these are more relevant to AF core um things like include things like the difference between to list and two list async okay um yeah so let's let's go on to the AFL let's move on let's move on exactly we're we're also we're getting ahead like in terms of time so we can we can move on so now um we've we've discussed what a link provider is in at a very high level what it does it's a component that gets a query tree as input we've we've gone over this this part of you know it does it does its thing right that's this is the input the query it does like a whole lot of complicated stuff and at the end of this thing it's supposed to give us back an innumerable that gives us the results of that query tree that was executed against let's say the database or whatever whatever right so this is basically all it is input is the query output is an inevariable this is what a link provider does this is what EF does in its capacity as a link provider does other things as well but in its capacity as a link provider this is what it does now in order to do its job now we're starting to talk about EF and databases we're getting concrete it needs to do to produce two things this this complicated Cloud thing here the first thing which everybody understands is it basically needs to produce a SQL translation of the query tree this is not trivial this query tree that you saw represents your c-sharp code but the c-sharp code doesn't look anything like what the SQL looks like at the end of the day superficially it might look like it right because you have a wear in c-sharp and that needs to get translated to a where in SQL right but obviously uh you know that's where the similarity stops right like in SQL you don't have an order by composed over aware composed over it doesn't work like that at all you have a select expression which has different Clauses so this transformation between um you know a c-sharp expression tree and a SQL expression tree that's quite a complicated task believe you me this is really really really really really really complicated and there's a lot of work in this green Cloud here to make sure that works but that's that's just the obvious part of what we have to do there's another less obvious part which we have to do and that's the logic to read back the results from that SQL query and material them to you so if you've ever worked with a database outside of EF core like with ado.net you send that SQL and what you get back is a table of results right that's a DB data reader to be precise which represents a streamable table of results like a result set there's rows and columns but what EF core gives you what a link provider gives you is not that what it gives you is um you know nicely materialized entity instances right it gives you blocks which you can work on because it's an orm that's the whole point of an orm is to bridge that gap of you know the gap between tables and the relational database and objects in your that's the object relational part of orm right so this process we usually call materialization what it all it means is basically we take that table of results and we materialize it we we dehydrate it sometimes you say into your own objects your blogs and posts and so on so that logic that piece of logic is called the materializer and what it does it gets an ado.netdb data reader so the Raw results the tabular you know Raw results from the database and it's a job is to produce blog instances out of that that stuff right so we have a server side thing happening here that's the SQL and we have a client-side thing happening here that's called the materializer or sometimes the shaper this process of going from the query tree here all the way to these two artifacts here we call compilation okay it really is uh because it is in many many ways very similar to what the compiler does we don't produce so the C sharp compiler produces something called IL right and then the jit takes that and produces binary actual machine code from it so we don't produce IL or binary we're not a compiler in that sense instead we produce SQL and a materializer but the cop you know it is still compilation and and you know in some sense of the word for sure at least if you look at the type of code that EF looks like it looks like a compiler in many ways it's just a different type of compiler I I would say so that's a little bit of that that thing now we're going to go deeper and deeper into this so first of all just a Drive Point at home about server and client side I find this extremely elegant I love this personally let's say that our our provider now encounters you know there's a context blocks and now we encounter the operator select and we need to process it in our link provider right so before we saw the select this projection here what we had was a select star from blocks and our job when we see that select operator is to convert that to select ID from blocks right that is that is what a select means that is the meaning of Select when it is applied to to SQL instead of getting everything about a Blog we only want to get out the ID of that blog so this operator which you see here this select operator applies a transformation to the SQL to the SQL tree if you will behind the scenes which makes us produce not a star but rather an ID Just for Laughs we never produce a star for various other reasons we actually write out all of the column names but the select makes us reduce that list from all the columns to an ID but at the same time that we see that we do that server site thing we also have to change the client-side materializer before seeing the select we have a materializer that needs to know how to materialize blog instances from all of these columns coming back but the moment we see the select we need to change that into something that instead can read only the IDS and in fact what that thing returns now is an i queryable of int it's no longer an iquerable of block in other words the select transformed the results coming out of the database and therefore it affected the change both on the server and on the client side of this query if you think about another operator like where where is very different where is going to make quite a difference here on the server side right we're going to have to add a where clause in SQL but it's not going to do anything on the client side because where doesn't impact the shape of your results in any way it only impacts which rows are going to get back but they're going to have the same schema right we're not going to change like which columns are coming back or anything like this so this is for you guys to understand in general if you look at how the EF pipeline looks like it's constantly working at these two levels it's like moving forward tweaking the SQL tweaking the materializer tweaking the SQL tweaking the materializer so that at the end of this process those two things come out of it and can be used to actually execute the query was that do you think that was clear Arthur do you have anything absolutely I I would the only thing I'd say is again to think about what is ultimately happening and to say we absolutely would not want to bring back every property of blog we wouldn't want to do a select star from here and bring back everything and then just on the client say throw away everything that we brought back except the ID so that's like again a fundamental part of being able to write efficient queries and be fast is to be able to do these things and there's so many different aspects to that so like the where Clause the where Clause is a different characteristic right the where Clause says bring me back fewer rows like I don't want you to bring back all the rows and often that is even the most important thing that you don't want to do is limit the rows but then the select course says don't bring me back all of the columns only bring me back some of the columns and both of those things are really important to handle um both in terms of either the SQL we generate or and then also as you say uh in terms of how we even handle which rows are coming back in the materializer so those things are kind of critical and um yeah very much and as you said like perfwise these are two operators that limit the amount of data coming back and so they're critical they're really critical you should always only select what you need both in terms of rows with where and in terms of columns with select so there's two dimensions of limiting data basically uh to drive this point home like what Arthur said this is a good opportunity to say uh to to introduce the as a numerable operator I kind of like it so we're in synchronous code now I know in general I never do synchronous code but I really want to keep things simple so I'm using synchronous code but never mind this is why we have two array and not to array async but you should never use to array you should always use two array async is what I wanted to say if we do this then this is the this is the iqueryable version of Select right you can see this here I queryable right and that means that the select gets translated it it see of course sees it and manipulates its SQL and therefore everything is efficient but we could also do this there's this operator called as a numerable and the only thing this does in life the only reason you would want to use this operator ever is to terminate the query in terms of where until where we get we translated the SQL and to force the rest of this stuff to actually be enumerable so if you run this query then what will happen is EF core will generate a select star with no uh you know no column reduction no nothing the entire table is going to get downloaded here and then this thing is going to be evaluated because of as innumerable which is the thing that we'll call get a numerator remember that triggers the evaluation and the execution of our query and at that point the normal select which is an enumerable one where it's going to get called right now we're in the innumerable universe so this is a marker inside of your queries you can you can have you know aware here whatever the ID equals is bigger than two you can have let me draw this out like this right so up to this select everything here is translated up to the uh sorry to the as a normal everything here is translated to SQL then there's this big red stop sign here that says if chord can go no further a link provider can go no further and at that point anything else that gets composed is now evaluated client-side and we sometimes tell people I'll give you an example let's say here you want to call some function which you wrote right it's a function you have it it's written in.net and you wanna you know you wanna let's let's do it let's do something a bit more um you know full of ID is equal to uh it's something to five doesn't doesn't really matter right the implementation of food doesn't really matter yeah exactly I'm now answering the question that's on the board so if I were to try this your query would immediately fail now it would fail obviously because this Foo thing is something that EF chord doesn't know it's not one of the standard operators it's some function that you made up that we have no way of knowing what it should be translated to right on the server side so we can translate this to SQL that doesn't mean anything right and because of that we throw a query translation failure and we can't we actually cannot do this now if you insist that you must use this construct you have the option of doing as a number one which is once again that stop sign that says up to here translate the SQL but from here translate on the client this means that we're going to bring the entire table back just so that we can call your Foo function on the client side on each and every ID of each and every row which is generally a horrible thing to do you should never do this unless you really really really know that the table is going to be small and so on but I I hope it's clear this side of what gets evaluated on the client and what gets evaluated on the server is absolutely fundamentally critical to EF core programmers very soft one so I think there's a there's a few questions that we should we should cover on that um uh before we do that I want to I just want to point out that um if you in in early versions of EF course so basically before 3-0 if we couldn't translate something we would sometimes do this for you automatically so not always but we would sometimes do an implicit an implicit client evaluation and we would bring back all the data and um and then we would uh do the filtering or do call a function or whatever it is on the client and we we stopped doing that because it's just too dangerous and we will put a warning out but who reads warnings right and I famously you know stack Overflow that uses EF core their site went down because they did this accidentally and we said to Nick Craver didn't you see the warnings like the first thing we do is switch off all the logs so we didn't see any warnings so you know so we stopped doing that um and so this this this is actually this is the workaround if you've got an ef2 application which hopefully nobody has anymore because it's been out support for ages now but you're updating it and you really have a place where as shy says you know there's not much data coming back there's not actually any perf issue putting the as enumerable is the the way to explicitly say opt-in decline evaluation which is why there's no flag anywhere any after say opt into it because that's it as innumerable is the way to opt into it um you could also do two lists two array any of those things do the same thing but then of course you're creating a data structure which you probably don't need and so uh generally speaking using as a numeral is better but functionally it will it will do the same thing it will bring everything back put it in a list and then on the client it'll go through the list and do do the workloads um so um so that kind of relates to this question you know have performance problems passed because a numerals snuck in there and I don't know if you're referring directly to to the EF doing the the client evaluation um but that's but that but if if that that is then that's um something to change now right so it's important to say um it's not possible for um enumeration like for enumerate um I normally to sneak in there anymore like it used to be like Arthur said but nowadays anything you write on your blog is basically going to be evaluated on the server side or it's not going to be evaluated at all with just one exception which is the top level select which it doesn't really matter right yeah but so so you can be secure in the knowledge that anything that's that that's an EF query is always either going to get translated or it's not going to work which is very important so um we already answered this I think Maurice is in the chat too answering Maurice is one of the people who works on the link provider on the team um but I think it's you know if you do have this function well what do I do about it then um and um I think you know to me there's there's two two answers to this one you actually write it in a form where you pass an expression tree of the function that EF can translate so sometimes people are only doing this because they don't realize and again it goes back to the signature and everything they don't realize that they're passing this as a black box and actually what's in that function could be something that could be translated if you do it in the right way it's not trivial or clear I don't know we want to spend the time to actually show writing that because it gets a bit ugly in terms of how you do it but you can but that is a that is a plausible thing the other thing you can do is sometimes it it makes sense to have a database version of this function so like a store procedure or a function actually rather than procedure and you can then map that to Something in the database and so you and this is this is often true we do this for BCL functions for example in the Base Class Library very very commonly um like you can use for example uh string.join or something like that and obviously we don't know what the code is in C sharp that is doing string that join we don't know that's fresh and true that's a black box but we say okay well that's the semantics that we have there databases have functions that will do the same thing let's do the translation there um and so you can do that you can also do that mapping we have built-in functions where there isn't a BCL thing where in in DB functions class that you can do and you can and you can write your own custom mappings either to existing database functions or to database functions that you write and so that's another way to to handle those things um I don't know do you want did you want to add anything to that shy no no it's perfect and then I guess finally uh in this this kind of area there's also and ask as a queryable method which I I very frequently see people using and it's absolutely pointless like that now they're answering the chat if you know the the the the one point that as variable is useful but in in when doing that it's not useful in any way shape or form like and I kind of everything of mine I wish people would just stop doing that queryable it is a queryable you don't need to do it exactly this is a no-up this does absolutely nothing because CTX blogs is already in a query about the same thing here that the result of the wear is also agreeable so there's no reason to do this ever yes uh so while we're well not that one good old time sorry this one scrolled on me can't believe we did it automatically you've got to take risks you gotta try things you know sometimes they work out and sometimes they don't but the idea that you can always get everything right I mean it's not like we didn't understand the drawbacks but at the same time you can make Arguments for it and we won't we won't go into that and and also when there's more of an argument for it if you're very uh immature link provider that can't translate stuff you can make an argument that it that it makes sense there I think that's what you know one of the things that Bryce said it made sense in the early days but not later but I would just say yeah we we tried this and it turned out that we were wrong and it was not a good idea and uh and that's why we changed it but yeah that happens right that's software that's life okay okay let's move on I want to at least get to the end of my uh my presentation I I wanted to also do a quick step through the code but I'm not sure we're gonna have enough time for that but that's completely fine um I do wanna I do want to talk a little bit about you know the actual internals here just a bit so the summary of Where we've got to up to now is we have the query tree we have this compilation phase and then the outputs are SQL and materializer I hope that's firmly in in people's minds right now the problem here is obviously that compilation is very very long so doing this thing producing both the SQL and the materializer that's not something we can do every time you run a query because that would make it extremely slow um so what do we do when we have a per problem caching always that's that's the universal answer so in ahead of the compilation thing which is the long thing we put something called a query cache I'll refer to it as our first query cache because there's going to be another one what does that mean what it means is uh this is a cache which uses as its key the entire query tree so this entire thing which is a tree is going to be like a cash key that means that when we see the same query tree twice we don't need to go through the compilation the first time the cache is empty we're going to go and do this whole long thing and then we're gonna save we're gonna remember the SQL that came out of that query tree and we're going to also remember the materializer that came out and we're going to store them in memory somewhere for you right there's going to be a cache there with eviction and we're gonna you know not not make it grow forever and all that kind of stuff the next time you run this query we're going to say hey we've already seen this query tree right and so we're gonna jump directly to the SQL into the materializer and bypass the whole compilation this is critical this is how EF core can manage to be fast without this e of core would be extremely slow so I mean this is Trivial right like uh put a cash in front of it right it sounds it sounds so easy unfortunately it's nothing it's it's not as easy as that unfortunately and here I'm gonna I'm gonna have to go and do a segue into parameterization queries and parameterization so there's this is also this one I I hope people like prick your ears this is actually very useful for users this is not just an internal thing you have to understand this if you're using if so if you write a link query like this this thing here on the left you see that the five is inside that link query right so um oops I mixed number I completely mixed them around I I somehow managed to get these uh these uh so in your mind the arrow goes like this the arrows are crossing each other excuse me I actually uh I reviewed myself and I missed this so if you if you put the five inside your link query then what you'll see is the five inside your SQL it's it's simple if it's inside the query it's going to be inside the sequel however if you do a VAR I equals five and then you reference d i here then what you're going to see inside your SQL is a parameter placeholder this is a SQL parameter and you don't see the five here the five is delivered separately now this is a bit odd first of all if you look at this from an innumerable point of view this as C sharp code pure c-sharp code there's absolutely no not supposed to be any difference between this and that these two C sharp you know bits of c-sharp code are exactly identical it is not supposed to matter if this I here is from here or from there it actually does matter a little bit to anybody who knows about closures there's actually you're actually doing something here that is that does matter uh but but like at our level let's let's say that it's supposed to be completed if the value doesn't change in in CC exactly it doesn't matter exactly exactly it doesn't matter it might it might it might be less efficient though uh performance-wise it might it might matter a little bit anyway I'm not gonna I'm not gonna go into this on the c-sharp side so the point is this first of all um it's very very important in SQL in general in databases to parameterize this thing here that you see here if you put this if you if we were to put uh to use the same query and once we would execute it with five and another time with six we'd have different sequels that means that the SQL Ser the SQL database regardless of which which database it is SQL server or posts with it's going to have to recompile like re-plan let's say these for these sqls every single time that's a very bad thing in terms of internal performance so you want to parameterize so that the server actually sees the same SQL every single time and that's a very significant performance boost and what we decided to do in ef4 was to say that if you use a closure um variable like this this is called a captured variable it's referencing something inside the Lambda that is coming from outside the Lambda a capture variable is the signal to EF that it should parameterize this thing and this is why you see a parameter placeholder here but you don't see it here so that's a very very important thing parameters are also important to prevent SQL injection but that's less important in our specific thing we could have prevented SQL injection even while integrating constants into the SQL so that's that's not relevant this is really about performance so I hope this this point is clear to people and when you write your your uh your queries keep this in mind if you have if you're gonna write like 10 times where b b ID is bigger than five six seven eight or nine then it would actually be much more efficient to extract that thing and execute you know have a different eye every time rather than embed the 567 inside there's a noticeable very noticeable noticeable difference with that out of the way why am I talking to you about parametrization we were just talking about caching the problem is obviously if I go back that interesting question here on the firm tires if it's a const what do you know what goes in the expression tree then sorry I don't know north of my head me neither that's that's a question somebody somebody try it and look at what the expression tree there's a there's a task for you like shall we showing earlier write it see what the expression tree looks like interesting question yeah yeah so moving on yeah it's a question for the for the Rocklin right this is not yeah yeah within our uh within our domain right we just get that get that thing so now for the problem uh parametrization causes causes a little bit of a problem for us for any of this for all of this to work we have to upgrade our architecture and add this box here which is called parameter extraction parameter extraction is a process where we take the original query tree we find those captured variables this I here is a captured variable and we have to extract it out if we didn't do this then each each query tree here would have a different reference to a different closure captured variable here because this is just how c-sharp Works in other words if we didn't extract those parameters every time we would see a different query and our cache would never ever give us a hit I hope that's clear if you actually look at what the expression tree looks like for this eye you will find that this is again we're talking about closure this is a field access over um and um over a frame object over a closure type and every time that's going to be a different type therefore we we have to kind of normalize all of these we have to make sure that all of the instances in which this gets executed actually are normalized into the same expression tree so that our Cache can function properly so what we do in this step we go over the tree we punch holes in that tree wherever we see this eye we take out the thing that's in there we put it somewhere else and we put inside a parameter expression that's going to be the same every single time and that makes the query trees identical and this is what this is what powers this whole thing so before our cache before we can do our actual cache lookup we have to do this parameter extraction thing is that clear um do you think Arthur yeah is that recently clear that's clear yes okay so we're progressively making this a little more and more complex like the way that this works but that that's that's that's going to be that's going to be okay now uh but we do actually have a question about compiler query that I've been waiting for this so let me just bring it up to give so I have that slide coming up just now oh wait you're so you're not doing the slide is coming up yeah I I am going to talk about just now okay so here here's the question uh how do I get the most out of compile query and compound async query okay so so I'm gonna I'm gonna explain the the this thing now now um uh lots of people have heard about this feature called compile queries and I don't think anybody knows what they actually do so this is the moment where I'm Gonna Act where I'm gonna actually explain what this does so this this thing is nice and it makes our queries very fast and the parameterization works and all the queries even with different values of I so if I is eight or nine doesn't matter we're still going to get the synchron in our query cache and get the same SQL and get the same materializer and everything's going to work well it's all perfect the problem is that parameter extraction takes time so every single time you're going to run this query we're going to have to go over the entire expression tree they're usually not big but it is still a visitation recursively of the entire query tree after that in addition to this yeah exactly now I'm going to get to this parameter extraction is one source of slowdown but the cash the cash very very well done for this this current question caching something where the query tree is your key is not revealed this is a normal cache it's conceptually like a dictionary first we have to calculate a hash code okay so what does it mean to calculate a hash code for a query tree we have to go through most of the nodes or to some subset of the nodes and calculate the query so we've already gone once like recursively over most of the tree here we should probably improve things in EF core but that's the way things currently work and then we have to implement an equality as well so we we have to go and compare our expression tree with all the other candidate expression trees that are in the cache and that is a recursive comparison node to node to see whether there's even one single node in that tree that differs from the version that we have cached that's a mess so this is not you know usually when you when you do a cache it's over an INT or or a string that's really fast a cache over a career tree is not fast that's something that is like an absolute term terms not something that is considered like very very hypers and because of these two things because caching a query tree is is heavy and because parameter extraction also adds a little bit of overhead we introduced something called compile queries and what compiled queries allow you to do is to bypass this entire mechanism as well so it's yet another form of optimization here but what you have to do instead is you have to manually compile I'm not going to say pre-compile because that's another word now you have to explicitly tell us ahead of time that you want to compile a link expression this thing here and this is literally what it looks like right this is an API EF dot compile query where you're going to tell us I'm going to run this query which is to search for all blocks where the ID is bigger than I and you're going to get back and delegate a function get blocks which accepts a context and that value for I and at that point you have your holding in your hands uh that delegate which you can use whenever you want right after compiling it for example I'm going to call getblogs I'm going to give it my DB context instance and an i 8 9 whatever and I'm going to call to list on that and get the results back so this thing is basically a way to jump over these two things here which are still somewhat expensive now I'm saying they're somewhat expensive in in in real life in a real application which is going to actually access you know um databases and I O and do disk access and all that goes over the network this is negligible so I don't think people now should go and run to their code and start you know coding compiled query all the time that would be wrong this makes your code much less readable and much less fun to write it reduces maintainability because you now have to manage your delegates and access them and manage them all that's not the way you want to write things what you want to do is you want to do uh yeah see that's what I'm afraid of what I want you to understand you should be writing link queries in the normal way and this mechanism is more than sufficient for the vast majority of cases but if you know that you have a very hot query somewhere and you know and you've benchmarked that you know using file query is actually actually helps in a real world scenario then you have the option of using it but I strongly encourage everybody to not go and use this API without first benchmarking and seeing an actual proof difference I I was answering a question did you talk about the composability aspect of it of compiled queries no yes okay so I think that's a very important thing that is not immediately obvious um you know one of the things that's showing up quite a lot in the questions is you know when I write a query and then I and I do it in stages which is a very normal thing so I might have actually shy and I argue about how common this is but it's relatively it's a it's a powerful part of link that I can say you know blogs and then I can do if some of if my user has added a filter put a where clause in there and if they need ordering put an order by clause in there and I can build up my you know query by composing over different parts of the query compile query you can't do that because in Cloud query we need to know the entire thing end to end so you can't write a compile query and then put a where clause on it you have to now if you want both of those you have to write two compile queries one with and one without the where course um and that is um a significant uh price to pay it can be a significant price to pay if you were to try and use this everywhere so as shy says using it where you know you have this hot query or these couple of hot queries can make sense but using it everywhere uh is really you know you'll end up writing not as nice code it's harder to maintain and everything and in in the Legacy ef6 code base and this is an example in my mind of making a mistake in in the level of um well maybe not quite evaluation but anyway um you could write a compile query and then we we would allow you to compose over it but it didn't it now became not a compiled query anymore and it wasn't obvious that that happened it just silently didn't it wasn't compile query anymore which I think was terrible so right um we don't do that now we just don't we just prevent you from composing over it exactly I mean the whole point here is to Cache the SQL and the materializer right to jump directly to those those things that are needed in order to execute a query so obviously you can't support composability without giving them up right so yeah anyway profile first optimized later thank you very much that's exactly what so once again like you know we we run uh tech power benchmarks uh we push this to the max compile queries definitely show up there and they're significant and it's important to use them but those are very extreme scenarios I really really discourage people from using this just just you know because it's faster it is not gonna matter in the vast majority of cases and it's going to degrade your code quality so don't just do it profile Benchmark okay rather I would I would recommend benchmarking rather than profiling by the way profiling is complicated benchmarking will at least give you like an answer of you know this is what it was before first and then use profiling as a tool it's accessory to figure out where exactly yeah exactly okay I want to move on uh now now now stuff is getting interesting um remember we discussed the materializer right materializer is the thing that reads back the results and materializes them into your focus um now what EF core does is what it um this is this is maybe a little bit complicated but we want absolute best perf right so what we want to do when we generate that materializer here in the compilation phase is we're gonna co-generate a materializer for that specific query specifically tailored to the results coming back from that query and that's the thing that we're going to use and also cash in our cache what this means is is if we're looking at um um yeah let's take let's take a simple query okay yeah whatever it doesn't really matter if we take a query like where bid is greater than I we know that what's coming out of that query is blocks we know exactly which columns that involves and so on we know at that point when we're compiling this query when we're processing it that we can generate code that's going to read that and materialize that into blocks at that point if we were to um uh have General code that every time it needs to read back the results looks okay which are the fields that a Blog actually has okay I'm going to look in the model block has an ID and a name okay so I know I need to read back an ID and I know I need to read back a name if we had General code that does this every single time that would be slow we actually at some point this was benchmarked and we saw that um you know that um that showing up so what you do instead in that kind of situation is instead of having General code that does this you you have General code which generates specific code that does that job right so it generates specific code that already is written generated with the knowledge that there's going to be an ID and a name and they're going to be an INT and a string and we're going to read them in this way and not in that way so we basically use what's what's called runtime code generation in order to do materialization in a fast way okay that's that's a so you know so the the the point here is people often talk about handcrafting you know it's like I can they say I can handcraft my my query which is one thing um and we we one of our goals is to be able in the SQL generation part that we went through earlier is to be able to generate the query that you would ideally if you were an expert at handcraft for this at the materialization level we also want to handcraft it essentially you know what we want to do is write the code as I was saying that is handcrafted to that particular result set that's coming back it knows that this thing is nullable or not knowable it knows that this is a type of int and so I don't want to box it into an object and then say Oh what type is it and then it better no I want to take the in off the wire and put it in to the class without doing any kind of conversions without looking up any metadata about it because we know all that we know all that up front so we can use our upfront knowledge of the model and the result shape and things are coming back to handcraft code that will be as efficient as possible and that's basically a fundamental part of um EF being as fast virtually as fast as Dapper in in for example taking power fortunes that came up on the chat is that we handcraft this code we don't we're not doing like General things um and it's also I think I think a place where an RM has generally or a fairly complex Orem like like EF has an advantage over something that is perhaps a lot simpler but basically just goes through and says oh if it's this then do that else if it's this you know that that's simple to write it's easy to understand but it's not nearly as fast as having this very directed uh generated code for it and and that's a that's a real advantage in terms of uh using something like EF to do this I wanted to show like uh just a a very simple example of what that looks like when you want to you know co-generate something at runtime so interestingly enough we're going to go back to those expression trees that we talked about before it's the same thing we're going to use expression please to express what we want to uh to to achieve but instead of translating them to SQL what we're going to do is actually compile them to Il to actual.net code right so it looks like this we've we've seen this we I just wrote this code out right but instead of so you see there's an expression of a funk of you know and this is the code in question you can call compile on it and this is a thing that goes to the compiler gives it this this expression tree which could be a very very deep and complicated construct basically the the EF core materializer and at that point what we get back is a delegate filter which we can now run remember I don't know if I still have this code here yeah I do so if remember we I showed this this thing here where we did this with the funk this is basically the same thing as this except that here we first build it up as an expression tree and then we instruct the compiler to compile that expression tree into actual.net code which we can then run and this is what EF core does this is literally what happens whenever you're running um you you know you execute a query it's gonna make up a big expression tree which knows how to read back the results of that query and then it's going to compile that thing and then that is going to be used every time you you execute your query and that is the thing that gets cached crucially right so this is again where the query cache is so important if we had to do this every single time produce an expression tree compile it these are very very heavy activities so all we have to do is do it once because we've already cached it based on the query tree and then we've got that delegate after compilation and we can just pluck it out of the cache and Bam just run that query super fast so this this is how things work one one last complication I know we're almost out of time so questions Arthur I think we I think we're oh okay I think first to go through okay so I'm gonna try to Breeze through this although yeah sorry okay okay no problem so I'm going to keep going and Arthur's debate a little bit so um we have a problem with uh with SQL SQL is tricky around nulls this is the last complication in my in my diagram you'll see there's a particular thing with SQL and nulls anybody who's used SQL knows about this so look at these two things I did it again I I messed this up sorry I should have reviewed more carefully so on the left we have string question mark s equals Foo and then we put s and on the right imagine that instead of Foo we have null this was the point here on the right we have now and on the left we have uh or maybe it's it's the other way around I don't know so if we want to translate this to SQL unfortunately we have to take a take into account the null ability here because for any non-null thing in SQL we just do where name equals that parameter that's fine but with null we have to use the is null operator so there's a very special way in SQL to ask if something is null you can't use the equality operator that's just how SQL works I'm not going to go into this because that's not that's not what we're talking about here this is this is a very fundamental design feature of SQL but unfortunately what it means is that whether this parameter is null or not affects which SQL we have to Center the database I hope that sentence that I just said is is clear because it's not trivial then the null or not now of this of of a parameter is going to affect the actual SQL if we could have the same SQL and just send the same you know null or not null in the parameter we wouldn't have a complication then everything would be perfect but life isn't perfect for us so that breaks basically breaks down our mechanism here if you look at this right we've extracted the parameters and then we do the query look up here in the cache based on the query tree after parameter extraction but if we've extracted the parameters then we don't know whether they're null or not and we can't have the same SQL for both cases right so this breaks down our design so we have to add another thing unfortunately and make this diagram a little bit more complicated yeah that's that's what I said we can't just cash on the query tree and this is the final diagram This Is the End it looks maybe a bit more complicated but it really isn't that bad at the end of compilation we don't get SQL and um and materializer we do get the materializer but what we also get is a second query cache that's why I've been calling this a first query cache the second query cache its key is basically whether each parameter is null or not so it's like a set of flags the first parameter is not or not yes or no the second parameter is null or not yes or no that whole thing is basically our key we basically encode the nullability of all the parameters and that's a key for us and then we do the exact same thing the first time this cache is empty so we're going to go into a second phase of compilation which I'm going to call here parameter aware compilation this is relatively shorter and specifically what happens here is the actual SQL generation so SQL generation happens here after we've already after we've already passed the cache which knows about the null ability of parameters right that means that in this phase we can know for each parameter what its value actually is is it no or is it not which means that the SQL can be tailored now based on whether the parameter is null or not the end result is the sequel the second time that we come here the query the the cache for this already contains an an entry for a parameter that's null or not null and so on so we can jump directly and get the SQL so to summarize you have a multi-layered caching system with an E of course query pipeline this is fairly fast at the end of the day right at the end of the day when the system is already hot or warm so we've already executed the queries after startup we get a query we do the parameter extraction we do the query and we find we get a hit we jump directly to the second query cache at which point we extract the query cache for the nullability of the parameters we check that and we get a hit as well at that point we get the SQL we have to materializer in the SQL we can execute the query so we just do these two jumps and jump over both the first part of the compilation and the second part of the compilation but of course when a new query comes we have to go through this whole thing and do all of this this this journey on the on the very very first time okay so there's there's a few questions about this thing that I think we can we write is a good point to go into those now very good point okay so you just answered this one basically um which is uh get it doesn't get compatible every time I run um wonderful question so today the answer is yes however uh some maybe some people are aware that for EF core 8 we're uh currently engaged in a native aot um experiment which would basically move this whole this whole thing exactly it would move this whole thing out of runtime you would do this ahead of time and you would when your application starts these caches will magically be preceded with all of your queries this is like where we're trying to get to and that means that your application startup will be far far faster because you no longer have to compile stuff at runtime um so uh then also this is what a question that always comes up what what cash eviction what happens right so the cashier has um a pre-configured maximum which if I remember correctly is a thousand um queries I'm not completely sure anymore I think it's a thousand queries there's um it does an lru if I remember correctly so as you're if you need a new query and your cache is full it's going to eject the one that was least recently used if I remember correctly you can tweak those numbers if you want but basically it just it generally Works kind of okay so we're gonna cache 1000 uh queries globally by globally I mean in your Singleton uh so in a Singleton uh context let's say globally to make things simple right your application assuming has just one you know one service provider one thousand queries are going to get cash by default uh as long as you're not doing something um by accident where you're generating a new query each time and I I would love to talk about it but we don't have enough time there are certain you know erroneous patterns of dynamic regeneration which can make this breakdown in a very very bad way as long as you're doing the right thing and you're doing it simple you're not going to have your it's very rare to have like a thousand completely different query shapes with and I'm not talking about parameters once again parameters are out of this picture right like completely different trees it's very rare to get to this so everything tends to just work so uh specifically about the null ability thing um there's a question here about using nullable reference types so um you know this actually happens now um uh in the sense that that model knows whether a parameter or a a because of it knows in your model whether a particular property is nullable or not Which novel reference types are used as one of the ways to determine that it knows whether or not this thing can even ever be null and so in that case um yeah that that already happens so I'll say to make Arthur's points Crystal Clear um uh what you've defined in your model for a property affects what kind of SQL we're gonna we're gonna generate so if a column is nullable generally you're going to see far more complicated SQL because we have to take into account the case where it's now so don't make your columns uh nullable unless they need to be it's basically what I'm saying if you if you don't need to hold nulls there tell us that you don't need to hold nulls and we'll generate better sequel for you now nrt only comes in into the picture for configuring your model for that property that's it you can also configure your model directly via the fluent API regardless of nrts nrts are just a signal to e of chord that that property in the model should be nullable or not that's all it is otherwise nrts have absolutely no role in E of course life so you can put a required attribute on a property or you can call this required in the API and it will do the same thing exactly um so this is this is something that uh is definitely relevant here and uh relevant to the aot you know if I have four reference type parameters will you know that explode and then related to that you know if I invoke it with its 16 different types if you go back to the so first of all yes exactly if you actually run it with the 16 different types then then that that that may happen and again it will use the normal caching rules and eviction and lru and so on I just want to say this has nothing to do with reference versus value types because the word references in there once again the question uh you you could yeah I mean you can have you can have a nullable value type uh it does okay the parameters here are we are in the dotnet side of things so you can also have a nullable value type and that's a now as well right so in in that sense anyway we can move on yeah um okay um so we talked about that okay um so um we're gonna we're gonna just go back and answer some of the um some of the more general questions uh as as as we can um and yeah let's just do that um I probably won't get to them all I'm sorry if I don't ask you a particular question here because we've had a lot of questions so appreciate everybody's asked them but we'll we'll try to we'll try to cover some of these things um so let's see let's go to um oh so just quickly so it was clear about the constant that const thing that people didn't know or at least some people knew and some people didn't know but somebody did try it and uh no that's the wrong just checked here we go they just checked and it's the same as using a literal so yeah constantly it doesn't get parameterized which makes sense is the way it should be I think um on uh on client evaluation breast breaking change you all did yeah this is why we have to do some Breaking Chains even though um they can be painful sometimes it's got to be a balance um let's see uh yes the querying large strings and VAR cards that's the SQL client issue go put your vote uh you know thumbs up on the SQL client issue so we can not that you know it's a difficult problem but um that that's a whole other stand up so um let's see uh to do try to find the most interesting ones here okay so here's a fairly simple one if I had two columns reflect is that also executed on the server yes um we'll bring back the two different columns that you asked for or however many you asked for right so um that's uh that's again about being efficient in what we bring back oh this was a good question how does the database provider factor in to all of this which is something shy knows a lot about in principle um this this diagram that you see um in front of you is exactly the same in the database provider doesn't figure into it where it does obviously come into play is within those boxes compilation and uh parameter aware compilation so the database provider can insert um there's various ways to insert functionality into the these green boxes which are black boxes in this uh you know in this in this specific thing we don't know what's going on inside because that's not what this standup is about obviously this is where everything gets customized for example uh you know postgres needs to generate different SQL than what uh SQL server needs so in this parameter aware compilation box somewhere is something called the query SQL generator and the postgres one is obviously customized and overridden and so on but that's only Within These two boxes compilation log and parameter where compilation show it aside from this it is exactly the same thing I will say one thing um e of course is not just for relational databases it's not just about SQL we also have Cosmos and and potentially and not not potentially and other providers which are not at all um relational for those providers everything I said about SQL is irrelevant and you don't have a second query cache because there is no SQL and this whole problem with nullability of parameters is a SQL problem so another database will not have it at all right so with um with other providers the diagram is effectively going to be this okay because you don't have that specific problem of SQL you're just going to have whatever the provider does in the materializer and so on I hope that answers that question absolutely um okay so um let me hide that one and uh from a friend Georgie here uh pointing out that there are libraries that you can use to help you build um expression trees dynamically uh if you need that although I would say again use the compiler if you can um but um but you can use some of these libraries to help um and uh you know one thing I want to just mention about the materializer thing you showed the coach eye on yours on your screen about how you know you said um if I write this and then it produces a funk you know that's the same thing as what the materializer doing but of course the difference is the materializer is really doing that dynamically it's like looking at the model metadata and the query shape and and so I wanted to just very quickly let me um let me add this to the screen so this is this is a part of um the materializer this is called The Entity materializer source and you can see it's basically doing these everybody cover your eyes it's doing these things that we talked about right it's um when we we build up this is kind of a bunch of functions that call into these other things which do things like do uh create member assignments and add that to a list of block expressions and then um you know one of the one of the interesting things is there was there was a nice if then else up there yeah I think that's a nice one addition this this is a good this is a good one so this is about materializer um uh interceptors so you know we we want you to be able to intercept queer we did query interceptors I think in the F7 um and we and that's a valuable feature but if you're not using them it should have a zero Perth impact right and so basically this is only going to get called if you've got uh an Interceptor we'll only put this into three so there's no code that runs at runtime and says if there's an Interceptor here do this like if there isn't an intercept we don't even have that code there um but if it is then you know we create these block expressions and then we you know we uh execute you know create new things that you've got objects um buying them to bind them to things core methods of them and we're basically building up this big list of things you know and then you know if there's if there's if there are properties um do this thing if they're not properties do a different thing because we know whether those properties or not so we can do different things um and so this this is the this is this is using expression trees right to build this Dynamic code and it's uh I to me this is the the fun the funnest part of EF I love it but it's not it's not easy either um you know and then at the end of it we end up you know putting all this stuff together and and creating a Lambda which uh essentially um this is only used in a certain co-prath but takes some information and goes and puts your query result out the other end and that's essentially how it works so anyway um so there's that um okay um so we talked about interpreting it's fresh entries um so there's a question here about how we handle new c-sharp features and that's kind of interesting because there's really two different parts to this there's things that actually change the generated IL or the generated expression tree and things that don't and a lot of things like init are uh handled by the compiler differently but as far as what efcs it doesn't actually make any difference it's the property seller is in it it's a Setter we could call it we don't care whether it's in and or not required is a bit different in the sense that it it actually doesn't necessarily let the compiler it doesn't let you actually do these things um so it's not just like a warning uh and it's not something you can necessarily work around if you're trying to generate the code from aot and so for our for the aot stuff that we haven't gone into detail here that some of those things can be difficult because when we're dynamically compiling the code again we don't care whether required is there or not it doesn't matter it's a compile thing the IL we generate is the same Expression 3 generate Cil the IL compiles it's all fine we don't care but if we were to try and do that in an aot ahead of time as C sharp code now we do different things you know um that's a topic probably for another for another stand up but it's a very interesting intersection between you know language features and Link um let's see I'm gonna have we're gonna have to skip some of these uh questions we're really over now way over but um yeah um okay so one final thing um there's always the other side of the fence um this person really loves client-side evaluation so you know it's uh yeah the the like I said it's you can see I think General consensus is it was a bad idea but it's not without its value that that people see as well as is always the case in in software so um anyway I think we got to the to the right place as innumerable is a fairly easy and very explicit way of saying I'm okay with it here yep okay um good stuff um there's always more stuff we can go into here um give us some feedback you know tweet at us or there's a link on our on GitHub for pinned issue for providing suggestions you know what what would you want to see do you want to see more about SQL generation and the expression visitors they have I'm throwing shy in the deep end here now see do that one if enough people want to then I'll I'll be happy to do another like that that's what we would we would go into the boxes basically the compilation and the compilation boxes then we could do this yeah anyway so let us know and uh thanks for all the questions um that's great and I'm sorry we didn't get to answer all of them but um yeah um I guess I guess uh we'll we'll say goodbye for now then and um we'll do another one of these coming soon thanks everyone bye [Music] thank you
Info
Channel: dotnet
Views: 12,876
Rating: undefined out of 5
Keywords: EntityFramework, beginner, csharp
Id: 1Ld3dtnTrMw
Channel Id: undefined
Length: 104min 58sec (6298 seconds)
Published: Wed Mar 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.