Entity Framework (Core) Unchained: Getting the Best Performance from Your ORM - Dan Mallott

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
right uh we'll go ahead and get started so uh this is entity framework core unchained um getting the best performance from your orm the spoiler is not don't use entity framework core so just want to get that out of the way um so we're going to start out uh we're going to talk about configuration um arguably one of the more important pieces is making sure you've got your configuration right talk a little bit about access patterns any framework does some very interesting things with some crud operations we'll learn about the really really easy foot guns to find and there are a couple we are of course going to touch on database design because if you have a poor database design nothing else that i'm talking about is going to matter um and then uh lastly we're just gonna have some honest conversation about things you shouldn't do with the f core i've got a couple of war stories there um including a recent one where i did not follow my own advice he's good right uh who am i uh so my name is dan malott i'm a senior principal for west monroe uh in our product engineering practice west monroe is a business and technology consulting firm based primarily in the united states we recently opened an office here in london they're primarily focused with mergers and acquisitions i.e making money off of other people's work there are less charitable ways i can describe it i've been developing software since 2011 in various capacities uh my first job i was the dba uh not necessarily voluntarily um but now i primarily do development work although a lot of times that means developing powerpoints uh primary experience in microsoft i've also unfortunately written that four letter language and i am a usa hockey referee i put ice in parentheses because hockey here means something different than it does in the united states right twitter github linkedin if you want to get a hold of me right so i'm gonna assume you guys all know this but i'm gonna go over it anyway um so any framework core is made up of two components uh there's an object relational mapper which is taking the results of queries and mapping them into net objects whether those be records or pocos or what have you right and then the other piece of it is a really really powerful query generator that translates your link or link q statements into a sql that's actually executed on the database or whatever query language happens to be right if you're using cosmos that might be different and the query generator is both the solution and the source of many of our issues um and i'll talk about that as we go on but suffice to say that writing a query generator that is going to generate something that executes against every version of sql going back to like i guess 2005 is a difficult challenge right so let's talk about configuration right so when you go to add entity framework to your application and you go consult the microsoft docs you get something that looks an awful lot like this which is great it's easy it's a one-liner who doesn't like a one-liner of code right that's why we see nested ternaries everywhere um and so it's really just we have the db context it's registered in our dependency injection container i'm using net apis not in the minimal mode here clearly connection string in this case stored in a configuration object and we're good to go right it works and it does just work um there are some things that are probably happening under the hood or that are happening under the hood that you may not actually want uh so for one there's no query retry who develops applications that live in the cloud who develop apps develop applications that live on premise who develops an environment that has 100 uptime on your sql server oh good no one's lying to themselves right connections go down who develops an environment where there's a dns problem once a month oh nobody's a million to that one either come on it's always dns right so query retries is important right and for a long time this was something that was sort of left to the developer to kind of handle um and in our defaults we don't have any retry that might be important you might want that um all queries are tracked what do i mean by that so in internal to the db context right so the db context is our container that holds all of our connection information and models and all the mapping information from the database it will track all of the results that it brings back and it does this for a couple of reasons one it does this because in theory if you ask for the same object four times entity framework core is again in theory smart enough to only ask the database for it once right um because nothing's changed the other reason it does it is because any framework core has a huge amount of change tracking logic built into it so that if i pull an object back from my database make a bunch of changes to it and call save changes it knows exactly what to do to persist that to the database or if i pull an object back i don't do anything with it and call save changes it does well you didn't do anything so i'm not going to do anything and this is this is really powerful right it automates a lot of that uh change tracking away from us but that might not be the best thing for your use case um related entities are explicitly loaded um fine um what is what does that mean it means that if i have a user that's a child to a post it's going to load that uh and you don't really get much logging of errors or warnings you get a connection failed so sorry try again later um again in a production environment it's fine in your development environment we might want some more information about why our exquisitely constructed link statement that's 40 lines long failed right so what can we do about this um well so first of all there's a couple of things this isn't all bad right i've kind of talked about that like there are some things that are fine and some things that are probably not fine right so you do want to implement a retry policy um you do want to use the proper tracking behavior for your use case i will dive into that much more in a few minutes you do want to log errors and warnings in the development environment and [Music] you may also want to log some sensitive data if you were trying to make your debugging easier um and you don't want to enable lazier eager loading proxies now microsoft is so on board with that one that those are no longer options unless you download specific nuget packages right um lazy loading is where we get the n plus one problem where well i got my users i iterate all through all their posts and now i'm like querying the database like mad and if you're in an on-premise environment the sand administrator's calling you if you're in a cloud environment your finance department's calling you and eager loading is i've just loaded the entire database by loading one user oops right so here's a suggested configuration again this is got an in is development flag make that a little bit bigger right so the first thing up there got a retry policy in this case four times one second each time now you're going to want you know just like we've you you hear this a lot at conferences this is example code don't do this in production uh you're going to want to do something more clever right so in a cloud scenario um you know you probably want some sort of like fibonacci back off right okay one second one second two right okay we all know that um but here it works fine right um and you'll notice that error numbers to add so the retry comes out of the box with a whole bunch of like the database didn't respond you had a login error uh there's a couple other ones but like it covers like your standard like six things that a sql server might like barf up if it just happens to not be there um but you can add some right so like if you get a uh well for example if you get a key conflict right or if you get a lock deadlock or something like that you can add those uh in this case i am putting in query tracking behavior no tracking that's more just to illustrate how you do that again we'll get to that in a minute and then uh in development we want to log some additional things and i do very strongly urge you to do this at least in your local development um if you're shipping your logs anywhere consult your security people first so the first is able to enable detailed errors so that gives us field level errors so if for example i try to stuff a six so in the united states all of our postcodes are five digits right and if i am taking a us-centric approach to my database design and i've done a car five or something like that or varchar5 and i try to stuff say a uk postcode into it it's going to tell me hey you're an idiot or at least you didn't think this through sensitive data logging so that is we're gonna get parameter values um really useful right like now i know that my id of one returned nothing or my id of one returned like caused an error and i can go look into that in the database don't do that in production don't do that in production don't do that in production and then i can configure some additional warnings so these warnings give us a different additional information that entity framer core is emitting but usually not doing anything with and in this case uh the two that i always care about are a first without order by and filter warning uh and so that's if i do a dot first or dot first or default and don't order buy or filter at all right uh the reason for that is most databases order is not guaranteed in your query results we all know that right or we've all had to find that out the hard way um and the other one is a row limiting operation without order by ironing right and that's for the same reason right order is not guaranteed so if i just call give me the first 100 of these give me the next 100 of these i might get the same 100 or i might worse get like 100 and then 50 new ones um right so these are useful things to get an error like oops i forgot to do this or you know maybe there's a use case here that i wasn't covered in my user story but i need to cover later on um and there are a huge number of additional core event ids that get uh emitted by any framework core some of them are kind of like oh i executed a query thanks uh some of them are marginally more useful um but you can go on the docs docs.microsoft.com in the framework core section and find out all the ones that there are there may be some additional that you want to use some of the some of the ones that used to be optional are now guaranteed which is kind of nice there are also relational event ids which tell you some additional information about how it's how any framework core is uh like dealing with your model against the database right so let's talk about crud by the way i'm really easy to spot at conferences these days i don't think i own anything that doesn't say west monroe on it um it's working at a place for a bunch of years does right so uh the first thing that happens so crud is create read update delete right okay we all know that so we'll take those in order and the first thing we'll talk about is inserting records okay if we insert one record we get one insert statement that's expected um but if we insert more than one record we got a merge statement and this is against sql server other providers will do different things um and so that that's sort of interesting right like why is entity framework core making that choice um and it yeah so those two lines right i just added a couple of users called save changes and got a merge statement um and you can see that i'm emitting all of the parameters okay so you can see the data that i'm shoving in there and then the sql query um so this is a really curious choice and it's a change in behavior who used entity framework six or previous okay we've got a few right so this is different than what at any framework traditionally framework did right um and it's not entirely illogical um and for those of you who know who brent ozar is he actually has a blog post from a couple years ago that digs into this much more but essentially so single insert statements are slow that's true ish inserting multiple records is more efficient again that's true ish right as with everything it depends um so the result of that thinking is let's use a merge statement to protect ourselves from strange sequel generation issues right because if i were to go hand write what it generated i would use some sort of table valued constructor in my insert right so that's insert my list of queries and then select and hard code all that stuff in there right and so they chose to use emerge instead right because building a table valid constructor is weird even though they actually are kind of doing that um so it's this isn't a bad choice right um this is of between using merge and using a bunch of insert statements this is fine right uh in sql server merge does have some issues with deadlocks this statement is not constructed in a way that you're going to run into that particular issue because we are just inserting to the table the deadlock issue happens when you're inserting or updating or deleting um but in your environment it may make sense to force single insert statements and we can do this um so the break point is three or more records so if you have one record or two record you get insert statements if you have three records you get merge it's an odd breakpoint but i suppose they did some testing but you can change the max batch size in your configuration and say my batches are one uh this is fine it does what you expect it to do now we'll get instead of one merge statement we'll get three insert statements we'll get into something that's not so good side effect of this a little bit later on right okay reading data i promised we were going to talk about query tracking so reading data we're just pulling data back from the database right simplest thing you can do with an rrm it's arguably the use case that you should be using an orm for um and there's a difference right so if you're going to simply return the data like through an api or to like a ui on your screen if you're in winforms land and you're not making any modifications to the entities that you pull back there is no reason to track them zero um but if you're going to be modifying the original entities um and saving them back then you should use as tracking now most of us when we're building apis we have discrete um read and update or delete or create endpoints you're very rarely like pulling stuff and doing a whole bunch of and putting it back in right in a simple case um so you can mix and match these right so you can configure your default behavior and then on your query itself say i do want this to be tracking or i don't want this to be tracking right and so it's choosing which one is your majority use case so in an api situation where you're serving a bunch of data it's going to be no tracking and then okay well we're actually creating so we need to track this guy um updates and deletes it kind of is what it is right however interestingly enough updates do not generate a merge statement they're just updates go figure they always happen in single statements so if i delete one thousand things i get one thousand deletes if i update one thousand things i get one thousand updates uh the only difference is if we mess with our max batch size uh we get either one connection or a thousand connections seems bad um so again that's want to pay attention to that speaking of easy ways to shoot yourself in the foot okay so bulk manipulation of data bad don't do that with any framework uh lazy loading entities bad don't do that with any rm uh using tracking for read-only scenarios not bad but you're costing yourself about 30 of your performance um may matter and then retry policy things fall offline for a minute so interestingly i harp on the retry policy because when i was putting together the code samples for this talk i was using a sql azure serverless which is great because it's really cheap it's not great because it goes to sleep every like 10 minutes of non-use and then the first query that you run after it starts up times out while it's waking up fun um and then your you know app throws a big giant exception because it couldn't even create a connection to the database because it wasn't awake great right okay so uh what are some other ways to destroy performance how many people run scaffold db context and just go with it great don't do that but using the auto-generated entities in general can result in far too much data coming back so uh loading too many related tables or too few so loading related tables.include that adds joins to your statement it adds complexity to your statement if you do like six of those fine if you do 60 of those write yourself a view um async the async promoting model is really really mature in.net at this point you should use it the async for each one that's a weird construct but it works and then defeating indexes by not understanding how your database uses them got an example for that guy right so what are the solutions here right so we can project only what we need out of a query and entity framework core is smart enough to only pull back what we need yeah that's gonna result in better performance uh only include the data you actually need right so if i'm loading up a user profile and i just have this like generic get user so all my examples are from stack overflows like users have posts right and i always load the posts like i probably don't need to if i'm just showing a user's profile um always use async unless you have a really good reason not to um and then learn a bit about how your database engine chooses which indexes to use like i said we'll get into that all right so how can we use select to limit fields so um everybody in here has used stack overflow right yeah okay so in the stack overflow database the posts table includes a whole lot of stuff um but say i'm just gonna show the body of the post well i really only need the body and probably an id so that i can keep track of it there's no reason to load the entire post entity so what i can do is define myself a custom post entity in a different namespace project our query right so that's what we talked about with the select guy that's a projection and then we get one much smaller sql which shows better in our log but two uh we get less data coming back right and so the less data you send over the wire the faster it's going to go right we can't defeat physics yet um just as an interesting thing you guys probably noticed i used single or default and i got top two which entity framework does to tell you if it gets more than one result so it can throw an exception at you it's nice um cool async yeah not not a lot to say that i haven't already said uh essentially the top one is going to perform better than the bottom one all else being equal now if you have 10 users who cares if you have 10 million users you're probably going to want to do this also if you have constraints about like auto scaling in the cloud things like um under load async is going to give you more responsiveness that's it uh indexing right i did mention i was going to talk about this so indexes are great right how many of you have a dba group that you get to work with a couple okay how many of you also play the role of the dba we've got a couple there too yeah so i have a slow query you could put an index on it that's going to speed it up right scribble grabble grumble sure um i will say indexes will also slow inserts updates and deletes because you have to update the index um but the really important thing here is that indexes are only used in certain circumstances this depends on your database engine so here we're talking sql server specifically if you have an index on a string field whether that be ansi or whether that be ascii or unicode and you use a wild card in your search and you start the you start the search with a wildcard sql server will not use your index so if you say ends with instead it starts with you are now no longer using your index and your query is going to be slow now depending on the size of your table may not matter depending on your sla for returning the data it may not matter but when it does matter it matters right so try to only use the with guy and there are ways to get around that so like you know if this is a really common use case you can create a computed column and reverse your column so that now you're just reversing what you're searching on um and then uh indexes are generally not used when you're doing an operation on the column so for example answer count divided by two um but yeah reach out to your dba team should you have one or you know put your dba hat on if you're the dba too because they want to help they really do this hey how can i speed up my slow query like well can you rewrite your query no hmm they want to help they really do how are we doing on time okay database design how many of you are just get to work on a product where you're designing your database from scratch let's go great how many of you have a database design that's like 20 years old yeah generally clients right because you only get one chance and once there's data in there changing a database is really really hard um it's it's a hard problem not to mention all of the governance around it uh right so some general guidance right do it right the first time for most of us that ships sailed but you know think about the proper design patterns and and specifically design for your access patterns right so you know 20 years ago when all we had were relational databases it was like yes you should go full like third normal form or whatever normal form right because that's a pure design and then along came these things called nosql databases um for not only sql you're not writing sql um and all of a sudden we started to think about like well maybe it's okay to have some data duplication if that speeds up our results right like if any of you are using cassandra right that's the entire that's like the entire design philosophy is you can have like 60 copies of your data and it doesn't matter because you get it fast um and then you know you have to like worry about keeping it updated but right but so discard pure relational patterns when it makes sense right so in a pure as an example in a pure third normal form pattern right you would have a country table that you look up against and then you would have a country id on your address that is silly you might have a country table because you might need to display those countries in a ui but having a foreign key to your address table is just adding overhead that you don't actually need because you have to do that join every time right so why not just chuck that text onto the address object um right poor design equals poor performance um so one of the first things i did in my career was moving migrating a bunch of foxpro data um i haven't lost my hair for nothing into sql server and for those of you who know anything about foxpro like it's fixed with and files and and the first attempt was we'll just replicate that pattern in sql server and that worked terribly so then we had to go and like redo it right so poor design poor performance uh avoid table per type this is a kind of this is one that is uh kind of more challenging in some cases um because any framework makes it really really easy to do one type one entity one poco one whatever to each table um 90 it's fine but there are cases where maybe you've split tables or you have like you know something that's better represented as an enum like in my country example um that like really like the shape in the database is not how you should be looking at it as an object or likewise the shape as an object is really not how you should be looking at in the database right so don't marry yourself to table per type use your databases tools right so these are things like views computed columns procedures shutter but doing these things can help reduce the amount of work you're making any framework and your database on jindoo and then lastly and perhaps most controversially on this slide avoid code first tools um i will preface that by saying code first for a quick start or you know something where you're just trying to get off the ground or again servicing like 10 users totally fine uh it doesn't scale well so we have uh an internal project that we use to help utilities in the united states manage people's applications for putting solar panels on the roof um because there's a lot of stuff that goes into that it's seven years old it started out life as a net mvc monolith with angularjs on the front of it and they used code first and it was fine because this was a really small use case with like you know two applications a day that our local utility was using seven years later now we're trying to you know strangle the monolith break it into services you know we've added functionality in microservices and all of the sudden code first is a giant pain because it's in one project in our monolith they've had to re-baseline like i think like 15 times to get any sort of decent performance so again if you're starting with it and you need to start fast or you're doing something small it's fine long term all right so what do some of these things look like so as far as entity framework core is concerned a view is just the same as a table um you just need to tell it it's a view so that it doesn't try to generate so it stops you from inserting things um so stack overflow again posts have a post type um now are you ever going to look at you ever going to pull back a post without its post type probably not there are a couple of different ways to solve it in this case i chose let's just add the type and the join and wrap it in a view so that we aren't having entity framework do that work every time because i don't have an example of the generated sql because then it will just generate select the view columns right from our post with post type view and it's just it it's cleaner right and you can control what the sequel looks like right uh right code first kind of talked about this a little bit but some other specific pitfalls uh data types there are some really unfortunate defaults there is no reason to store every string field as a unicode um now if you're building a big international app many of them probably have to be but not all of them uh you are left to define your indexes um you probably get an index on your primary key particularly if you're using sql server um but now we're adding indexes uh when we're running our migration that's gonna be slow which if you're running your migration as part of a cicd process the last thing you want is your cicd process to be slow because then people won't merge and build as often and right uh you got it written you have to write views manually um so you are going to end up writing sql there for me not so scary for other people maybe more scary or more annoying um and then code first doesn't gener generally doesn't utilize more advanced database features um again it's a size of data problem right most of the time you're not going to table partition most of the time you're not using in memory tables or things like that you might want some user to find types when i get to my i shot myself in the foot story i'll tell you about that so let's talk about the things that entity framework core is really bad at lots of joins um i had a great example of this from entity framework 6 that was like this wonderful arrow pattern of a query and i've lost the image unfortunately but if you have a lot of joins so include efcor is going to struggle to produce good sql it's just it it's not only it looks ugly but it is actually not optimal um etl yeah so how many of you are aware that um rebrickable publishes a data set of all the lego sets and pieces and minifigures yeah got one so rebrickable.com you can go and download the data set um and i got tired of using stack overflow so i said i'll use this lego data set this is great i like legos everybody likes legos they have an icrd you can do it it's really nice and they publish it in individual csv files so it's kind of on the user to figure out how to get that into a data structure right um so download the csvs i go well this is something small and quick i'll just use entity framework core for for loading it right not kind of kind of ignoring that at least one of the tables is over a million rows so i go and i write all the code and i do some like genericization so that i don't have to like write these you know write the same code 15 times i go run run it just on a small table first just on like a 10 row table like okay it works cool let me run it on everything got to that 1 million some of them were kind of slow got to that 1 million row csv over 1 million rows clc and i went oh i guess i'll go get a coffee so i went to get a coffee came back nope still not done ran my little test query like oh it's inserted like 3 000 records it's not good went and talked to my wife who's in the next room working from home is wonderful came back hit the hit the query fifteen thousand well i'll go like clean the cat box and clean the cat box came back upstairs my computer had gone to sleep and the application stopped because i was loading everything locally in a docker container see i didn't follow my own advice so uh yeah don't do hpl within the framework core it's going to be very slow and it's going to be very painful i rewrote it to use a table value parameter which you create a table type in sql server you shove all the data into a data table and net and call a stored procedure and it's an insert and it runs very quickly and now it runs in less than two seconds to load that one million row table right um don't be me follow my advice uh and lastly uh parameter issues um you will know this if you run into it and that is you know parameter issue can be as simple as you're feeding a unicode string in where it's an ascii string or you're feeding ascii string in where it's a unicode string and sql server says well these things are close enough i'm not going to complain about it but i'm also not going to use an index thanks um and there there's just some times when it's going to do the wrong thing you're going to have to write sql to fix it unfortunately okay so the primary goal and this is really important to remember any frame framework core's primary goal is to is reliable execution right now the team has done a ton of work to improve performance like if you haven't used it in a while or if you're using an older version it is worth updating to six actually was worth updating to five it's really worth updating to six um but it's its goal is to be reliable right so the sequel it generates is not going to be the most performant this really matters when you're going against cosmos db because how do we pay for cosmos db yes by query time great um the other thing is that the pre-compilation items so you can now compile your queries again with ef core six don't have a demo for that unfortunately um but that can also provide some better performance and also more uh repeatable performance um but the thing is there is significant overhead associated with all of this right it is a full featured orm with all the good and bad things that that comes with um and you know that just creates overhead which is fine right if you really really need performance raw ado or something like dapper right like if that if performance matters that much use something that is closer to essentially writing it yourself it's going to take more work so there is an roi there right but yeah that's uh that's it right um so i've gone very fast i hope you guys have questions uh right so understand how it works how it doesn't work the docs are your friends they're really much more useful than they used to be you want to measure performance and if you've done one and two you're going to have three which is have a plan to fix it when it goes bump right that happens to everybody this thing that worked all of a sudden doesn't work right all right questions i went very fast no questions i i was i was perfect i'm gonna see nothing but greens in the bin out there and then all the reds in the little tupperware thing right um yeah so that's that's really all i have um if you have questions that you don't want to ask in front of the entire room feel free to come up and ask me as you are leaving please make sure you fill out an evaluation that's very helpful for me it's very helpful for the organizers and thank you all very much
Info
Channel: NDC Conferences
Views: 8,239
Rating: undefined out of 5
Keywords: Dan Mallott, .NET, Database, LINQ, applications, apps, Entity, Framework, NDC, Conferences, 2022, Live, London
Id: ZKVXl2640ps
Channel Id: undefined
Length: 44min 3sec (2643 seconds)
Published: Mon Jul 25 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.