Kevin Kline - SQL Server Internals and architecture

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this session will be for many of you if you are consider yourself expert level knowledge with sequel server it will probably be a review this is actually this session is intended for those of us who work with sequel server a lot but we don't work with it very very deeply we don't spend a lot of time deep inside of sequel server and so one of the things I learned in my career is that as a professional in the particular specialty that we have chosen there are certain models that we can follow in terms of the skill sets that we should emulate and so I was actually one of the founders of pass back way back in 1999 and at that time I ran the database management team for a big consulting and auditing company called Deloitte and we spent a lot of money trying to figure out what were the highest performing teams in the world and for those of you who are interested in sports you might be interested to learn that some of the management consultants consider the highest performing teams in the business in the world typically to be the pit crews for racing car teams anyone watch racing at all right you know you you look at these pit crews and there's a person who just does like one or two tires right there's a person who just does the gas and you know the fuel for the vehicle and things like that very specialized skills but there's not a person on that team even if all you do is change tires there's not a person on that team who doesn't know everything there is to know about an internal combustion engine right you're not going to hire a person like that to be on your racing car team if they don't know what the Pistons do want the fuel injectors do and what the spark plugs do and so forth and there's an analogy for us because we are IT professionals but many many times I talk to people who have no idea what the relational engine that they are a team member on does they don't understand it they don't know why it works the way it does and so that's kind of the intention of what this session is about I have two blogs Kevin equine comm that's my professional technology blog and then for IT pros comm is also a blog for the if you are trying to develop and cultivate your soft skills leadership skills management skills things like that one of the things I found is that many of us get promoted higher up in our organization because we're the best technologists on our team we're the best developer we're the best DBA and we get promoted and then everything that made us successful in our career up through that point no longer matters isn't that a smart way to promote so the good news there too is that those skills can also be learned I work for a company called sequel century we make a variety of different tools monitoring and management tools for sequel server one of our most popular tools is a free query tuning tool called plan Explorer and it is and always will be free it's so free we don't even ask for your email address so just download it 60,000 users love it and there's a lot of other things here if you want to take a picture save it for later send me an email to take advantage of any of these options one other thing - we do have a pro version of PE pro so I got permission to extend a special discount this is the discount you would normally get if you bought ten or more licenses for the product so if you're interested in the pro version just drop me a note but if you're not get the free version it's it's exceptional really really is a good tool and in fact if you read a lot of the blogs most of the people who do blogging about sequel server and they're right about query plans execution plans they're going to be writing about it using images taken from our tool rather than management studio it's much much better than sequel server management studio so our agenda this was actually loosely based for the first time on a book I really like by a friend of mine named Christian Christian Bolton in in the UK and so we're going to we're going to do you could probably tell I wrote this originally for Amsterdam based on our first topic there we're going to drop some acid early we're going to go through that the internals of the sequel server engine from two standpoints first as a read operation as a select statement and then we're going to go through it a second time as a write operation each time we're going to look at the different elements of the engine but we're not going to go super deep if you have questions we can talk about that or I can point you to some other resources if there's something that you really really want to know about so everybody ready to drop some acid Sanka first thing in the morning but we're not talking about lysergic acid right we're not talking about hydrochloric acid we're talking about this one of the major factors in any relational database system that governs the way it is architected it is designed is this acronym the acid property of transactions this was the big innovation for relational database systems when relational database systems came out the the proceeding kind of most popular method for managing data was either what they called a hierarchical or a network structure or something like flat files with cobol and fortran that's actually how I got my first professional job was for NASA I was a really good Fortran programmer anyone here used Fortran and you don't even have gray hair how is that possible I don't have hair I have an excuse yeah so different ways to doing it so there was an innovative group of people who said you know what the way we're doing things now introduces a lot of problems we for example if I delete a sales record I might have lots of sales detail records left behind so we have to make sure that we introduce some some methods into the way we set up our system so that those kinds of things don't happen in particular we want to protect money whenever it comes to money people tend to get a little bit defensive if you lose track of it over time you know they don't like that very much so what does for those of you who've taken a class on it and studied what does a stand for in acid not many people have taken a class on yes atomic yes so it means relational databases are nuclear-powered right big mushroom cloud where okay now that joke failed think about chemistry okay in chemistry we have compounds like perhaps a cup of coffee and then within compounds we have molecules like the water molecules the lactose molecules from the milk or the cream that were in the sugar molecules caffeine molecules and then below that we have the atoms that compose those different molecules we have let's say carbon and you know oxygen hydrogen and if we split those any further in a chemical sense it loses all of its characteristics of what makes it unique right it's no longer carbon if we break it any smaller that's no longer hydrogen so the corollary here for us and transactions is they are atomic they are just themselves they have only the properties of themselves they're not a compound they don't do multiple things then on a molecule they don't do you can't also do an insert and an update at the same time using the exact same logic you have to do one individual thing okay what about the C what is the C in that acronym I heard it consistent right so what's a consistent transaction come on now you guys are just you're just testing me on G it's the same every time right so if I'm going to withdraw $100 from an account it has to be $100 it can't be 98 can't be you know it can't do things a little bit inconsistently how about I isolation right it has to be isolated we cannot affect other transactions other users at the same time if we are where we're going to have data anomalies of different sorts each one of these actually in the order that we're listing creates characteristics inside of the database system of increasing overhead okay so a Thomas is atomicity is the least overhead inducing of these then we go to consistency that produces a little bit more overhead because we have to have things inside of the relational database that will control consistency it's called locking we have to have a locking engine right and the locking engine also applies to isolation so for those of you who feel like whenever I use the no lock hint I'm actually going faster there may be situations where you go faster but you're also disabling certain characteristics that are necessary consider necessary for a good relational system how about D what is D me durability thank you very good a transaction must either be fully committed or fully undone we can't do it halfway if a lightning bolt strikes our server while we're doing a big insert you know we're doing a big set of cash deposits those either have to go all the way in or not in at all how many of us have actually restarted a sequel server and watched in management studio as it as it starts to send all those messages across the screen you know for each database it'll say you know starting such database what is it what does it then say at that point you'll say things like five rows rolled forward three roll three records rolled back then we'll go to the next database in seven row records rolled forward three you know and it's actually applying all of those durability aspects of the transactions and that whole infrastructure is what we call a transaction log we also call it an archived or redo log on Oracle so lots of different lots of different things inside of the relational engine are required for the acid properties of transactions in fact if we decide we don't need this we can have a really really fast database anybody use MySQL a couple of you you look kind of scared like a sequel people will give me a beat down here well in the early days one of the reasons it grew in popularity so much was because you didn't have to do any of this yeah you can have really really fast transactions when you don't do any of the acid properties because I don't have any of that overhead of course if anything happens that you haven't programmed for then you could lose data or have inconsistent data or incorrect data and that's also one of the things that's really popular about no SQL databases like Cassandra or MongoDB or Hadoop or things like that well HBase things like that is that they do not guarantee these sorts of things Facebook uses something called eventual consistency right does that sound good if you're doing a bank transaction eventually get updated sometime tomorrow next week something like see Facebook doesn't care how quickly our pictures of our little cute kitty get updated right people have a very different threshold for how much they care when it's not related to money but when it's related to money we care a lot okay so those no sequel systems that churn through enormous amounts of data and those big data scenarios they really weren't intended to be monetary systems right now another reason we have a lot of the different characteristics inside of sequel server the different attributes and you know different parts of the relational engine is because we want it to be faster okay that's simple enough part of the way we do that is by maximizing Hardware anyone know when relational databases were first cooked up just about a decade anyone 70s thank you I'm going to have to give him a copy of my my book sequel in a nutshell unless unless you guys want to step up here step it up a little bit in the rest of the presentation so yeah the 70s right and what was the primary input/output system in the 70s it's actually tape back then and then hard just started to come on pretty strongly in the 70s at the end of the 70s the early 80s you know I remember I bought my first hard disk you know as a 10 megabyte hard disk and it was about the size of a microwave oven is today you know yeah so all kinds of stuff and I'll explain as we go through all kinds of stuff is designed in there to take advantage and maximize Hardware part of the problem with that is that hardware has a certain paradigm that pass away with time so there's something new you may have heard of called Hecate on and that's built on entirely new ideas of what hardware is and can do for us rather than the older disk based subsystems then finally the other reason that there are a lot of architectural choices inside a sequel server it's basically because Oracle does it so that's always a big motivator for Microsoft if the competition does it then we need to do it too alright so now when I first wrote this presentation way back years ago nobody knew who this guy was it really surprised me you guys get the Big Bang Theory here yeah woohoo Sheldon Cooper right so he's going to be our tour guide and what I thought I would do was this is kind of a dry topic it's not you know it's not terribly engaging so rather than just throw out facts I thought I'd try and make it a little bit more like a story and so what Sheldon is going to do is he's our tour guide and he's going to take us through sort of a map if you will imagine maybe he's a bicycle courier and he's carrying a message or something like that so he has to get a message through and this is the map that he has to navigate this is the internal architectural structure of sequel server all right any questions we're done right yeah just kidding so like I told you in the beginning we're going to start with Sheldon and he's going to do a read operation okay so imagine you're sitting at your console it could be with management studio it could be on a website it could be with a client-server system the results of what happens behind the scenes are the same consistently throughout any of those kinds of ways of interfacing with sequel server will behave the same way so let's say for example he is going to book a trip to a conference and so he goes to his favorite travel website in you know there's the box that says where do you want to fly from and where do you want to fly to okay so it's going to do a lookup to make sure that the code that he types in is a real city it's going to do a select statement to see if that exists in the table that has all the airport codes what it will do is it will take that query and it will wrap it up in something called a tabular data stream packet okay this was something invented by Sybase way back in the day and still used by Sybase and by Microsoft sequel server today that TDS packet is then going to be picked up by the sequel network interface the sni okay and it will use one of the excuse me one of the various protocols that have been specified to carry that message to the sequel server so what would a protocol be what's an example of a common protocol TCP IP biggest one on the block right what are some of the other ones hmm named pipes would be another I had one person once say ISDN I don't even what is ISDN yeah there are some other ones VDI shared memory some of the older ones named pipes is probably one of the longer standing ones there used to be SP x IP X if you use Novell and banyan vines all kinds of old ones like that so basically let's say it's tcp/ip it's it's got a tds packet that it then kind of stuffs into a tcp/ip envelope and it mails across the wire to the sequel server sequel server by default we'll catch it on port 14 33 the tcp/ip port and sequel server will then pick it up at the command parser what's a parser do what does it mean to parse something yeah I think the word parse in English means to cut up right so it cuts it up into its language elements so if instead of typing SEL ECT he had typed s EE L CT you know transposed two of the letters it would say I don't know what this word is I don't understand it so it looks at it grammatically and checks the syntax for it so it says ah select I know what this is this is a sequel language event that means I can go forward with it if it was s EE L CT it's AI I don't know what this is I'm going to throw an error and throw it back to you so then sequel server begins to try to do what we said earlier it's going to try and maximize the hardware and to use memory is about it's no less than 16 times faster and it's probably as much as 300 times faster to use memory than to actually have to resolve this by going to the i/o subsystem so sequel server says have I done this Select statement before anytime recently if I have done this Select statement or something very very close to it then I can reuse the execution plan that I've already created I don't have to make a new one myself so it's going to check in the buffer pool now that I'm sorry the buffer pool is all of the memory that sequel server has access to there are two main areas you can see in the box there there's the data cache for data tables things like that temp tables then there's the plan cache the plan cache is used for execution plans ad hoc sequel stored procedures triggers things like that are those the only caches that sequel server has no no not at all not by a longshot those are the two biggest ones but there is probably at least a half a dozen other ones there's access cache there's a sort cash there's hash buckets there's all kinds of things that sequel server puts into memory and if you want to see what all of those are you can use the DMV called DM OS memory clerks and you can see what they all are and how much is being used for them just an interesting difference - between Oracle and sequel server and oracle you can personally adjust every cache there's a knob for each one in sequel server there's only a knob for the total amount of memory but sequel server gets you know you can say what's the minimum and what's the maximum those are about that's about it there's a few more that they've added an actual recent releases as well so imagine again he's going he's peddling on his bike and the protocol layer is kind of like a little kiosk it's a tiny small little building you could just shoot right by it go straight to the command parser but the query optimizer got that kind of in a three-dimensional box the query optimizer is like a 72 story building and we want to be very careful about delivering messages there right because we might have to go to the very top of that building so we want to avoid going there if we can so after we check to see if it's in the plan cache if it is we reuse it and sometimes we can reuse improperly so you'll see sessions that are about things like parameter sniffing or how to optimize plan reuse different names like that that's because you can do things in the way you code that will cause plans to be reused improperly or to be not reused enough and so there's a lot of opportunities to do a better job there with how you manage your plan cache then assuming it's not in the plan cache well we do have to go visit that big high-rise over there so what sequel server does is it creates a query tree and it hands that over to the optimizer now the optimizer is going to go through a it's technically it's like a four-step process but they begin at pre optimization and then they go to phase one and phase two so it's four steps but it only gets up to number two I'm not sure I understand all of that an interesting thing about the way sequel server is going to choose to optimize this query plan to is it does not look for the best plan does not look for the best execution plan it looks for the optimal execution plan what's the difference between best and optimal what's the difference the cost of calculating yeah that's definitely part of it the big difference is time to get the best plan it might take 30 seconds for the optimizer to evaluate the cost of every option but to get the optimal plan Microsoft is kind of a hard rule that says whatever is best that you have found in half a second choose that so that's the default behavior of sequel server you can adjust it there's a trace flag to to tinker with those settings but by default it's going to pick the optimal plan whatever is best it can find in about 1/2 a second or so and so it goes through this multi-step process the first is what produces what's called a trivial plan this is a pre optimization so that might be if you say select get date I mean I'm not even getting a value from a table right it's just a system variable select from get day trivial plans it doesn't really even cache it just clears them out immediately doesn't put those up into the cache the next phase is what we call phase zero and it only it looks at three patterns if you do a select from a single table if you do a select from a couple columns and it can have a few other things like a very very simple join so that's phase zero and that produces what are called transaction processing plans the plan level after that phase one it produces something called quick plans and that's where it starts to now it's starting to make a lot of choices I see there's a couple joins there's a where clause that's got a function call and now I can start to look at multiple indexes maybe I want to use this index or not that one and it also begins to apply at that phase heuristics so not only is it making choices about what plans I can choose it's also making choices to prune off parts potential parts of the execution plan analysis and so it's cutting those options off and it won't go all the way down that path and won't go all the way down this path it's going to say I'm not going to try to do a hash merge our hash join I know absolutely that I'm not going to do it and then finally is the complex queries that's phase it goes all the way to phase two which is when we have parallelism we have all kinds of sorting we have all kinds of things happening at that stage if it goes all the way to that full plan and so that's where sequel server is sending Sheldon all the way up to the 70 70 second floor of that building right he's had to spend a lot of time there and I'll explain about how sequel server handles those in the plan cache as well in just a moment so now it's got its God it's all figured out it's just a logical plan at this point it's not physical it's not actually pulling data from the desk disk or the i/o subsystem but it knows what it wants to do it then takes and hands off a query plan sometimes called a query fingerprint as well it's actually not human readable if you were to look at it in a debugger it's a hash it's one twenty six the size of the written text to read it and it hands it over to the query executor or executor however you like to pronounce that both of those remind me of death for some reason an executor is the person who cuts off your head with a great big battle and then executor is a person who takes care of your will after you've died so one way or the other it has a morbid connotation for me but so then sequel server uses a little internal ADB mechanism where it hands it off from the relational engine that's the query processor up at the top that's all logical and it hands it off to the storage engine which is all the physical work that sequel server has to do to actually give Sheldon the answers to his question the first element that it hits in the relay and the storage engine is the access methods and so if you look at a query execution plan and I hope you have looked at execution plans you'll see that there are both the logical operations like I don't know you know an inner join or something like that and then there's the physical operations where it says it's a anti join or left semi auntie joint or different things like that you know and it will it'll tell you what index it's acting upon and how it's getting all that data what the cardinality estimates were that's where it starts to look at how can I get this data out and that's why statistics are so important this is another big bottleneck in relational database systems is what indexes do you have that make the shortest path to get to the data it will make choices based on the indexes and how fresh those statistics are and how accurate the statistics are for the way you've designed your tables so you might have a situation in which you have a table that has a primary key but no clustered index and you know it's very prone to page splits and things like that it's very fragmented access methods will take into account that I don't have you know a clustered index to look at so I'm going to try to use a non-clustered index down and then you know and then I'll take this kind of hierarchy of choices to get to the actual data now we want to go back to that other touch point earlier of maximizing hardware and that's where the buffer manager comes in so again he gets on his bike he pedals over from access methods to the buffer manager and he asks the buffer manager where is my data it can really be basically in one of two places it can either be in the cache already I've already you know somebody else came along and asked for all the airport codes so that's in the cache if it's in the cache once again just like what we had with the execution plans we can save ourselves an enormous amount of work because it's right there in memory but if it's not in the cache we have to then go to the i/o subsystem and retrieve that data and load it into the cache now I do want to point out too that there have been times where I and I know other experts have said that's when sequel server has to go directly to disk and that's actually a misstatement we're being inaccurate because sequel server never goes directly to disk if it is manipulating data in any way it's actually always doing it in memory so it grabs it on disk and then it loads it into memory and then it does its work there and then if it has to you know put that back out it'll send that back out to disk so we look to see if the answers to his questions are in cache if they are yay it's done if not we have to inquire of it from the data files and load it up into cache and that's another huge you know I do a whole full-day session that's kind of about tuning your setup and configuration for sequel server because how you set up your files and your file groups and what kind of subsystem you put them on you know if it's raid or raid five or a ten SSDs all of that can make a huge difference huge difference in the performance of a system could be the exact same queries exact same database design but just the way you've laid those out on files on your database again area of further study so if it's in cache or if it has to be loaded from disk to the cache then it is returned actually kind of a round trip back through the whole system it's handed back to the buffer manager buffer manager says okay now I've gotten all the information I need access methods let him know and so it pipes it all the way back through and then Sheldon finally gets his answer right so that's that's kind of how a select statement works now another part that I need to tell you about let me go back so this is just Sheldon by himself but what if there's 400 people like Sheldon tried to book travel plans right what happens then as it moves from the protocol layer to the sequel server engine there's a component of sequel server called sequel OS and older versions of sequel server it used to be called UMS the user user mode scheduler and with sequel OS what happens is again I'm going to use an app analogy for each CPU that sequel server has could be logical or physical it could be a hyper threaded CPU or you know a real core on the chip that is it's given a scheduler and you might think of a scheduler as a kind of like a drive-through window at arrest one of these fast-food restaurants and normally when Sheldon connects to the the system he's given a thread on that scheduler a speed you know if you look in sp who or the Activity Monitor and so he asked for all the airport codes and the scheduler gets some time on the CPU processes his request and hands him back what he asks for however sometimes things go a little bit bad they don't work out too well maybe I asked wrong you know I didn't if raise my question well maybe there's something wrong back behind back behind the the counter and so what happens here is is worth mentioning as well because this is very important not just for understanding how sequel server handles multiple users but also your first Avenue for troubleshooting and performance tuning as users are assigned to each of the different threads that they're given and the scheduler can handle lots and lots of threads at one time if you are able to get all the resources you need you're putting put in the running queue you're actually on a CPU your work is being done everyone else who is asking for resources and have what they need are put into a runnable queue and this is also known at this point sequel server for any worker that's assigned to a thread begins to accumulate for them wait statistics anything that makes you wait to complete your operation is accumulated and put into a category and you can query the weight statistics DM OS weight stats to see all the different places that sequel server is waiting to get work done so what happens is we might have a situation you know Oh out of soda at the at the soda counter and you know so they're like hey you know let's let's go get some more of the stuff we need to make sodas same thing happens in sequel server you're set you're told we can't make uh so you're told I can't give you all the records you need I can't lock them all using the acid property of transactions to keep them consistent and isolated and durable can't lock them all at one time because somebody else has them so why don't you go ahead and step out of line for me and once those other locks are released I'll let you make that update then and so that's what you are then put into a suspended queue and as you're moved out of the running queue someone else on the runnable queue gets to step up and get their work done everyone in the runnable queue is in the broadest category is giving a signal wait a signal wait till sequel server I've got everything I need I'm giving you the signal I've got all the locks on all the records I need just process me right and then the other kind of wait like I need these extra records to complete my transactions those are called resource weights and so you can look in sequel server just to get a an idea of the difference between signal weights and resource weights and then you can take it further and you could say well how many people are waiting for compile time on CPU how many people are you know for the lock is an lck underscore sort of lock that's I'm sorry sort of wait then that means it's a lock related wait if it's CX packet weights and that means it's related to parallelism so this will tell you by looking at that DMOS weight stats what are your main problem areas inside a sequel server now one other thing to dive into a little bit deeper when looking at the big picture from earlier is those caches again memory is so much faster than any other method that we might have to access sequel server plans data and different things like that so what what happens inside of those caches how long does a cache stay or page stay in those different caches does anyone know what the kind of algorithm is that is used to keep data or execution planning cache first-in first-out LRU I was trying to think of that actually works first-in first-out LRU so LRU means least recently used if you're an algorithm aficionado it's the LRU - Kay algorithm and you can look it up at sigmod or ACM what happens is and this is controlled by very important process in the in the background that runs in the background of sequel server called lazy writer and I actually usually ask potential DBAs when I'm interviewing them you know tell me about what lazy writer does this is one of those internal things just like in the racing scenario you know what does the spark plug do it's internal but it's very very important and for me when I think about lazy writer I always mentally visualize this this old 70s movie 60s movie with Jack Nicholson and Peter Fonda on big Harley's riding around America smoking marijuana and stuff like that and but that one's called Easy Rider so don't tell someone but it's the Easy Rider that's doing that I've done that mistake so what happens in their LRU least recently used okay remember go back to that other the other diagram we had a minute ago how big of a deal was that query optimizer compared to say the network protocol you know the network protocol was like a little kiosk you know were you waiting for the bus right just go right through it the query optimizer is like the 72 story building so when we talk about execution plans for example here's what happens we we execute a stored procedure called get order sequel server creates two counters forget order the first one is the complexity value that's the one on the left and on the right hand side is the complexity counter that counts down about every minute or so or whenever a thread requests it the lazy writer in this case I'm just using Sheldon the lazy writer wakes up and takes a look at everything in the cache and you know you can see in the cache that there are some things that are really not complex at all SP for sp1 you know they just have values of two or three and notice that everything gets decremented by one so now SP four has a value of one on the counter side but the complexity value is still set to to the complexity value doesn't change the lazy writer goes away it comes back them about a minute later takes another look around at all these different values the different complexity counters and if those have not been used by another individual remember I said it's going to take a look to see if that execution plan exists in the cache if it does exist in the cache it doesn't have to use it again it doesn't have to create a new one but if it doesn't exist in the cache it creates a new one loads it up in there like it did forget order just a moment ago some of these over time get decremented down to zero like SP four over there in the corner what what do you think sequel server will do when the lazy writer comes back and it finds one of these stored procedures or triggers or views or select statements any of that kind of stuff and it's hit 0 what will sequel server do the the the most accurate answer to any technology problem is it depends right so in this case what sequel server will do is if there is memory pressure and it's been marked at zero it will get rid of it but if there's not memory pressure sequel server will say I just leave it in there just in case somebody needs that right because I have plenty of memory and that's actually called a ghosted record it's not yet deleted it's been ghosted and so there is something you'll see if you look at the speeds from 50 downward you'll see that there are speeds in there that say say things like broker cleanup tasks and then there'll be one that's a ghost cleanup that's what takes care of any of these that have hit zero that are needed to be dismissed so that we can make room in memory so if there's plenty of room in memory we'll keep it but if there's not there's memory pressure will get rid of it and if anyone comes along and executes one of these like find user somebody else comes along and reuses that execution plan we re-establish that number we reassert it back up to its complexity value so the idea that we're that we're doing with our code here is that the harder it is to compile the longer we want it to remain in the cache the easier it is to compile the less we care about the fact that we have to recompile it and build a whole new plan for it so the really easy stuff gets to stick around for a long time or for a short period of time because it's easy to recreate but the really complex stuff gets to stay in a lot longer by the way how how realistic are useful are those complexity value numbers when you look at them if you were to look at that number and pull it up in the internals and look at it what would it mean just when you look at that number actually wouldn't mean much of anything because it's it's internal to to the sequel server that operation it doesn't actually correlate to like milliseconds of time it doesn't have any correlation to anything except how complex this particular piece of code is relative to all the others out there now another question that people have asked me is what about the buffer cache so this is you know the plan cache is one part of the cache the buffer cache is another how different is it with the buffer cache the short answer to how different is it is both very little and enormous huge amount differently so it's either not different at all or it's almost entirely different so what am i saying actually what I'm saying is in generalities it works very similar you know the older it is the less frequently it's been used the less likely it is to stay in cash those kind of rules are all the same however there are like thousands of little I'm exaggerating there are hundreds of little asterisks at the end of that saying well if the table is between 5 and 10 gigabytes then it's going to be stored this proportion will be stored for this period of time based on how much memory pressure and if it's between you know 5.1 and our you know 10.1 and 20 gigabytes in size and all the other to you know so there's all kinds of if this then this else this sorts of things for how the buffer cache is managed so in generalities it's the same but in the specifics is quite different because there's all kinds of alternatives based on how much memory pressure is happening on the buffer cache it's that kind of understandable so you need to make some sense yeah so one real well-known kind of option in the buffer cache is that if your table is 64 K or less so you have a small table it's 64 K or less it'll always do a table scan and load the whole table in just because that's that's the easiest way for sequel server to grab all that data in a full extent at one time it just throws it all in there whereas if it's a really big table it starts to get more and more smart about how it keeps all of that data resident in the cache all right so we talked about reads we talked about how the cache works we talked about how the sequel OS component works let's talk about writes now okay same layout that we had before and things work pretty much the same way so you know Sheldon is now going to actually start to book his ticket to a city he's gone through all the checks he's going to do an insert and update a delete statement maybe emerge that's also a DML statement also incidentally things like grants and drops those are also write operations it's just that those are right operations to the system tables give you permission to different things the create table drop table creates stored procedure job storage unit those are all right operations as well it's just that they don't write to user tables they write to system tables and so they produce the same kind of locks same kind of activity here so sequel so Sheldon sits down now he's going to book his trip to Brussels or Antwerp wherever he's going takes it and puts it into TDS packets which are then stuffed into TCP IP packets hand it a you know sent across the wired a sequel server it catches it by default on port 1433 hands it off to the parser and the parser says do I recognize this is this a language element or not if it is then it checks to see if it exists in the exit in the plan cache if I've already got it the plan cache again can save myself a lot of time and trouble if it doesn't exist then I need to build it right so I'm going to take and hand over a query tree to the optimizer and again we go through that four-phase process we're going to produce from that the query plan now one thing that is interesting to note is like when I started my career I actually started writing a lot of reports and doing things like that and writing a lot of select statements I was really good at it and enjoyed it and then I was asked to start writing the data modification kinds of work like insert and update and delete statements so like the first time I wrote a delete statement or perhaps when you write a delete statement hopefully you're first doing a select statement right to make sure you're going to get just the data that needs to be deleted and no more no less just does that sound familiar yes okay thank you I'm reassured and one session I was given this lecture and one guy said no I just used a rollback if it's not right I thought I don't want to work where you work yeah I'll just undo it if I make a mistake Wow okay that yeah so normally if I'm writing a if I'm going to write some delete or updates or something like that I'm going to write a select statement first to make sure that I get the right data you know one of the things that was really surprising to me is that when I looked at the execution plan for the select statement it was a small tight execution plan it wasn't anything very scary but then when I looked at the update statement it was huge it spanned page after page after page why would that be why would an update or delete be so much for an insert be so much more complex than a select statement it has to check all the indexes yeah keys triggers thank you so if it has triggers it also has to build into the you know if there's an update trigger on that table it has to build all of the trigger logic into the execution plan absolutely and the booth so I expected that when I was writing at that time what I didn't expect was the impact of keys so what is what is a primary key it uniquely identifies each row right and what is a unique key it also can uniquely identify each row how do we tell if a value is unique we have to do a select against all the values in there to make sure that it's not colliding with a pre-existing value so now we have to build in the execution plan for that select statement so if we have a primary key on these three columns and a unique key on another column now we have two new select statements that go into that execution plan and what if there are foreign keys foreign keys say that we have to make sure that there's a value in another table so maybe we're doing an insert or an update to the sales order detail table and we have to make sure that there exists a sales order header record so now we have to do a select against that table - and that execution plan will be included in the execution plan of your DML statement so your DML operations your right operations tend to have much more complex execution plans because they have to take into account triggers keys and all of those other things check constraints stuff like that I once back in the day when I was mark Souza who is a Microsoft person who is here it was the sequel 600 gosh way long time ago he kind of called me over and said hey Kevin I want you to take a look at this not in the sense that he wanted me to give advice or anything like that in the sense that one friend might ask another to come to the window of the tall building they work in so they can both see the car rack out in the street right and it's like what is it he had a customer who had execution plans for their various reports and things like that that would run into the 30 megabytes of size a single execution plan and if you ever wondered what a 30 megabyte execution plan does to the plan cache think of a car wreck it's a horrible horrible thing and I'll tell you the full story about how they resolved that later on but that was really quite an interesting experience so we're going to see if it exists in the plan cache if it doesn't exist we have to build it we use the query executor to hand it off to the access methods just like before except things get a lot different now okay this box was whited out earlier all those acid property of transactions now put an enormous amount of overhead on sequel server if we could turn these things off and some people try to with hints we could get a lot faster but then we introduced a lot of risk into the system two main areas in the transaction manager are invoked at this point the first is the log manager because in sequel server nothing gets done that's a right operation unless it is first written to the transaction log and it will be written again using a hash algorithm it's not human eye readable but there are some tools that you can purchase and there are some there are some commands you can use transact sequel commands you can use to actually read the transactions that are in the transaction log so that's the first thing that happens it has to be written to the transaction log tons and tons of advice about how to make your transaction log go faster because if the transaction log stops working if it fills up and doesn't have any more room to grow no transactions can be written it's all done right so it has to go there first the transaction log it's interesting the transaction is written fir and nothing has been changed in the database in fact we have to then go on and take all the steps to lock the transactions that we need the records that we need to are the pages of the tables and then we invoke the buffer manager so we inquire from the buffer manager where is all this data that I need does it exist just like we did with the execution plans for the Select statement we're going to say okay does this data exist in the cache already have already loaded this up in cache if I have it in the cache I don't have to acquire it from the i/o subsystem if it exists in the cache I'll go ahead and make those changes right there on those 8k data pages that are loaded up into the cache does that mean it has been written out to the data files yet no so what is the name of a data page that has been changed but it hasn't actually been written to disk yet yes it is a dirty page it's a very dirty big dirty dirty page yes you dirty page and that's all it is it's different up there than it is to disk and that's one of the reasons why people will say not always a good idea to use that no lock hint you know a lot of times we're thinking oh hey we're going to be a lot faster with the new law can't well you're also going to expose yourself to risk because you're looking at things that have not been made fully permanent in the database there might be changed later on so there's another process so the lazy writer is the process that cleans up the cleans up the plan cache there's another process and you can see the name of it they're called the checkpoint checkpoint is responsible for keeping the data cache cleaned up ok and again it's going to run by default in sequel server based on a setting and SP options as the the configuration settings called recovery interval ok so you can change it yourself you could go in and you could say I don't want the checkpoint to run by default at every minute I want it to run every 5 minutes so you change that recovery interval setting from 1 to 5 and it would only run once every 5 minutes or you can you know you could type it in it's a Nancy sequel command you could type in checkpoint at the management studio interface and it will actually flush all of those dirty pages out to disk right then in fact a friend of mine is now a member of sequel cat the customer advisory team he worked at the United States biggest printer they printed all of the glossy magazines you know like Vanity Fair and Vogue and all all of those kind of magazines and they put in a whole new mechanized system back in the early 2000s that was built on sequel server and by the way if you ever buy a product and the salespeople say to you you are our biggest customer we're so excited don't buy that product because it's not going to work right if they sell to a you know a different size company than yours and you're the one who your biggest by far things are going to be rocky and that's that was their experience the the system would print on rolls of paper that were taller than Chuck and Chuck is taller than me by several inches several centimeters I should say he couldn't touch the top of that roll of paper it was huge huge roll of paper and it was a very very long print line and between the amount of time it took for an operator to see that paper was starting to turn sideways and maybe get messed up to walk to the end of the print line and press stop it would actually have printed through a whole roll of that paper all right huge huge high-speed system and it each step along the way it wrote records back to sequel server and the when they turned it on for the first time and started to run sequel server hit that one minute mark wow that's a lot of transactions whoo hard work and then the next load came through at the two minute mark and sequence server saying whoa can't keep my both of what we can buy three minutes it was like oh I'm dead you know X is four eyes it's all over right like wow what is going on well they found out that the check point was just throwing it was just set to default it was just throwing to many transactions at the configuration they had and so they went in I figured out we change these rolls of paper all the time you know every five minutes or so let's just manually type in checkpoint will turn will set it to the maximum 120 minutes and will manually issue checkpoints ourselves whenever we change a roll of paper and work fine when as fast as this subsystem could go didn't have any of those buffering issues so you can control that all right but once it finds the data that it needs it makes the changes on those aka data pages and then flushes them the disk then it's essentially considered fully written to the database system if it's not if it hasn't been flushed by the checkpoint and sequel server experiences a massive failure what happens to those transactions are they lost I was hoping for a resounding no it's like then I know you were like not asleep now right acid properties of transactions when you start up your database the next time if it fails it'll tell you hey I'm applying all those transactions because they were written to the transaction log already they weren't written to the database so I'm going to roll forward the 32 transactions that that were close enough to being finished they were fully isolated and consistent and durable I'm going to roll all those forward and there were 17 that weren't quite ready to go so I'm going to roll all those back so it would keep everything in tip-top shape that way all right so that's pretty much the architecture the major components inside of the system now we could go deeper and deeper and deeper if you want to spend another day talking about it so love to do that with you as well like what is what is in the buffer cache what is in the data files you know what is what are these structures called I am zand gams and s gams and all that sort of stuff so we could talk about all those other things but for most of us we don't really need to know all of that but it's extremely important to understand just like those pit crews we need to know how that engine works you know what are the major components what do they do and how do they work so let's let's kind of review that as we wrap it up here all right so a atomic thank you some of you are awake see consistent yes I isolated and D durable very good all right how about the the relational engine which I had at the top what are the three main components in the relational engine sorry command parser yes the query optimizer the big one yeah and the morbid one with the battle-ax the executor right and then in the storage engine again pretty much three main components buffer manager transaction manager access methods very good and then the cache we had two main areas two main areas but lots of little ones data in plan right couldn't be simpler then the transaction manager had two small areas in it our two areas in it as well sounds kind of like you're at a rock-and-roll okay all right those were the lock manager and log manager right all right you knew that you were just wanting me to make even bigger fool of myself or if you you're sadists I just realized it all right and what are those two processes that control how things are written into cash and flushed out of cash what are those two processes called lazy writer and checkpoint very good all right so some of you were outspoken enough to save the rest of you you passed the test very good all right we have a couple minutes for a few questions anyone care to ask any questions Wow yes our transaction log records placed into the buffer then actually it goes straight in it doesn't it doesn't spend time going up into memory and then back it goes straight there's a there's a special transaction log manager and there are some features in sequel server that can access the transaction logs as they're written to to the disks whatever subsystem you've declared that it goes to for example if you use transactional replication change data capture I think possible auditing also looks at that so anytime those there they're kind of sent through a pipe I guess you could say there it's very very specially designed to go very quickly to that subsystem wherever it is it doesn't first go up into some area of the buffer cache or anything like that yeah good good question yes what else yes in the back is there a difference in the different protocols for sequel server tcp/ip name pipes and that sort of thing there are differences definitely there are a couple new newish protocols shared memory and VDI which are much more high-speed and the intention for I can't remember which of the two I think it might be well one of those two is was specifically designed for an operator who's actually physically connected standing next to the machine so that it's much much faster so maybe if your remediating a problem in trying to fix something that's horribly broken you could use that protocol rather than tcp/ip because you know then it has to be put into the packets and sent out and so forth so that's kind of the ideas I'm inside or right next to the machine so I want to do it all in that same workspace I'm not you know 300 miles away from the data center something like that yes yes the relation between how old it was and how complex right I guess explained right okay well to restate sequel server wants to the question was you know how does that work and so what happens is I guess you could say the the more complex the longer it is the more time it took to compile that particular code and load it into the execution plan the bigger it is the higher the complexity number is okay and based on that complexity number is it's kind of given a life span and if it gets used at any point before it expires that life span expires that number gets reset and so if it's a very short query that says select employee ID from the employee table where name equals such-and-such that has a number 10 let's say if after 10 seconds nobody else uses it it's gone but you know the end-of-year reports for the financial office as a complexity value of 300 thousand so we're going to let that stay for a long long time before it gets aged out of the cache so it's kind of that indicator is the lifespan of that particular execution plan and if anyone uses it before its lifespan has expired it gets renewed and stays in a little bit longer than that does that make sense yeah time for one more one last question yes where comes the judgement of using memory versus temp TB well of course much of the time we specifically ask to use to MTB because we've used a hash mark or you know pound pound so we've declared a temporary table the other thing too is in the when we're at the access methods point sequel server we'll look at what you have asked it to do and it makes all kinds of decisions about whether it needs to create temporary work tables okay so for example if you ever have a sequel select statement that has a group by there's probably a 60 or 70 percent chance that it's going to create a temp table in temp dB if you have a group by statement that has a having clause 100 percent of the time it will create a temporary table M temp DB to sort out that result set so actually if you spend a little time and learn the what all of the different steps mean in an execution plan you can look at your execution plans and know instantly that this query is creating temporary work tables if you ever see a sort I'm sort on a spool operation a hash join or a merge join almost always will you be having temp tables created so sequel server will usually choose that at the XS method step based on the work you're asking it to do in the query you can also use a couple perfmon counters to see what is happening you can see how many objects are being created per second and temp TB and then also you can look at there are specific perfmon counters for the access methods the whole category of access methods so you could see how many sort operations are being done how many gosh I'm trying to think of what else so there are metrics that let you see specifically as time goes on what's happening but if you were to look be looking at an individual query it would be at the access methods point and based on the query execution plan whether it shows to create a temporary table or not yeah very good question anything else yes one last one yes great question so with the transaction log what's different about the with the transaction log what's different about the different recovery models that you can use right and I was actually hoping someone would ask this question so you get the gold star a lot of people think that because I've set my database to simple recovery mode I'm not really doing anything with the transaction log and in fact that it's not true at all everything still has to go through the transaction log because this is a relational database we are governed by the acid properties of transactions all that simple recovery mode does for you is a tell sequel server every time I do a checkpoint truncate the transaction log and clear it all out so if you're in bulk load or in full recovery mode sequel server allows that transaction log to grow larger and larger so that you can recover to a specific point in time when you make the conscious decision I want to use simple recovery mode what you're basically saying is I don't have really rigorous recovery needs so I can just you know recover to last night's backup or something like that but that doesn't mean the transaction log isn't being used and shouldn't be tuned just like I was saying earlier it will be used it needs to be tuned the only difference is it's being truncated by default again if you've changed that recovery interval setting in sp configure it could be a different number but by default every minute it's just being cleared out when it passes let's wipe it all out minute passes but it is accumulating in there and there are times when it's possible that a transaction gets kind of stuck in its throat if you will it doesn't finish but it you know doesn't resolve but it has started that transaction and so when that happens the transaction log cannot clear out past the last oldest open transaction so if you have open transactions and you're in simple recovery mode it's still possible to have a transaction log that will fill up your disk drive and make your sequel server crash because the transaction got stuck and wouldn't complete so great question all of the infrastructure is still the same but when you said it's a simple recovery mode it simply says try to keep that as clear as you can as we go along good question all right folks were all out of time I hope this was informative thanks very much hope to see you soon
Info
Channel: SQLugSWE
Views: 81,351
Rating: 4.8942456 out of 5
Keywords: Microsoft SQL Server (Database Management System)
Id: 34VqSliEfsc
Channel Id: undefined
Length: 66min 9sec (3969 seconds)
Published: Wed Nov 27 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.