SQL Server Statistics Basics – Part 1 (by Amit Bansal)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I am glad to be back with another video from sequel mistress calm and this time the subject is statistics it is a huge subject so I plan to record a series of videos around the subject but let me get started with the basics first so this is part one and of course there are going to be multiple parts to it I work for a company called Domino and under innominate we have multiple brands sequel miss rose is where I spend most of my time we deliver trainings and consulting to more than 200 customers globally masterclasses accelerators hackathons and a unique learning platform called hands-on labs are some of our notable offerings on the community front I spend my time with data platform geeks and sequel server geeks in the early years it used to be called a sequel server geeks but now because Microsoft Data Platform has moved beyond sequel server we have rebranded our community efforts as data platform geeks and if you go to data platform geeks comm you will see that we offer a lot of free training videos webinars in-person events and now our webinars are also being recorded and we host them online but one of the most notable offerings from data platform geeks for the community is the annual summit that we host every year in the month of August in Bangalore and that is called as data platform summit so as of this recording this year the fourth edition of data platform summit which is called as DPS 2018 has already been announced and three days of conference is happening from 9 to 11 August and two days of pre-con on 7th and 8th August it's happening at Hotel Plaza in Bangalore we get huge participation at this summit primarily because we have Microsoft data group product team members that fly down from Redmond and of course we see huge participation from global MVPs and M CMS so really our speaker the cream the elite speaker panel that we have is the draw for the audience so there's great opportunity to learn from these global experts and to network with them and we are very grateful and thankful to all our speakers over the years this event is hundred-percent learning and 0% marketing if you're keen to explore just visit DPS 10.com or just drop an email to contact at DPS 10.com and hopefully some will we'll get back to you yeah that's the URL you need to remember DPS 10.com my name is ahmed Munsell and i have been working with sequel server for more than 20 years now I started my career in the year 1997 and since then it has been a rollercoaster ride with sequel server and Microsoft Data Platform I started my career as a developer and then moved on to database administration and project management since 2006 onwards I have moved heavily into consulting and training role and I've worked with more than 200 customers globally my core area of specialization is sequel server performance tuning I am a Microsoft Certified Master of sequel and I also hold MVP credential for many years now apart from my day job whenever my sessions are selected and I get an opportunity I speak at global conferences and thankfully I have spoken at multiple decades multiple ignites sequel bits and even past summits and of course our own data platform summit I have found a data platform geeks.com in sequel server geeks.com in 2010 and since then it has been wonderful experience doing lot of community work globally that's the URL sequel maestro's comm /i math - Bunsen if you wish to learn more about me no more slides on the subject we will straight jump into action so let me switch on to the VM now first before I begin with any demo let me just quickly tell you something about statistics what exactly they are statistics is as an object inside sequel server and that object contains a lot of data and in simple terms it is data about the data when you submit a query to sequel server and the optimizer generates an execution plan and then execute that plan and sends the output to you one thing that happens inside that black box the optimizer is that execution plan and that execution plan is heavily dependent on statistics you always want a great execution plan an efficient execution plan so that you are your data is returned as fast as possible and statistics play a very important role here what exactly statistics to the execution plan is is actually comprised of multiple operators or more technically speaking I traitors and each I traitor has to deal with number of rows so if the optimizer knows the estimated number of rows that it has to deal with it we the optimizer can actually choose the right I traitor and in in turn choose the most efficient execution plan to know the estimated number of rows it depends on statistics and this is what we call as cardinality estimation which is estimated number of rows versus actual number of rows so statistics play a very very important role here of course sequel server can work without statistics also but in that case as you can understand the execution plan is just going to be a default plan based on some estimates and that might not be really good for your query so let's get started with this simple demo to understand the basics of how statistics work I'm using this database adventureworks 2012 and there is a table here called person dot person I create a copy of it as person dot person to when I select from this particular table and I will also turn on actual execution plan wherever needed when I run this I'm just simply doing a select star from this table I am trying to get all the records now when you tell the optimizer that you need all the data then the optimizer really need not bother because they just could be one single way of how you can get all the data just scan all the pages and send the output to the client when when I run this and when I look at SP help stats which let me find out if there are any statistics on this particular table I can when I run this I can see that this object does not does not have any statistics or indexes as I said simply because I ran select start from person dot person to but the moment I tell sequel server that you know give me all the data where last name is equal to bun cell and when I execute this of course I don't have any record here which is equivalent to bun cell as last name but something has happened behind the scene if I go and look into the help stats system stored procedure I can see that sequel server has now created a statistics object on last name and it did that before executing the query why because it actually wanted to estimate how many matching records could be there for last name equals to one cell so you could see in order for sequel server to generate an efficient execution plan it is dependent on statistics and if the right statistics object does not exist sequel server tends to create one automatically now automatic creation automatic updation is something that I want to talk about in this video so let me just drop this object back again and and run this again from scratch so I have dropped this table and I am creating this again once more and now let's scroll down further and get started so first thing I will show you that no object statistics object exists so it's all blank and apart from SP help stats I think starting from 2012 onwards Microsoft sequel team introduced another DMV called DM DB stats properties and if you join this DM V with the system catalog stats you can get a little more detailed and meaningful information about stats so I can see that right now I have nothing with person to now just a few seconds before you saw that when I ran the query with last name equals to bun cell cycle server automatically created statistics that happened because auto create stats database property was turned on and that's the default if I turn this off let me do that if I turn this off and then I execute this query which I say select star from person dot person 2 and I put my predicate on three columns first name middle name and last name and I run this I get that one record which matches the criteria and when I look into the execution plan you can see it's a it's a table scan which is good but when you look at SP help stats now you can see that sequel server has not created any stat objects simply because we turned off Auto create statistics likewise if I look at this DMV and join it with the system catalog says dot stats I can see that there are no objects and and going forward I'm going to use this particular query I won't use SP help stats just to call that out now let me turn on auto create stats so when I turn this on and remember sequel server I need to show you one more thing before I turn this on if I go to table sorry so I did not turn this on so let me run this again okay when I run this and I go into the execution plan you saw that there was a table scan and when I take my cursor over this first thing you will notice that there is a small warning symbol here and of course whenever you get such warning symbols in the execution plan it is advisable that you go and look into it and the warning here says that column with no statistics and this is very important whenever you are doing query tuning and if you have columns without statistics and the optimizer is doing some filter on those columns it is calling it out that you got to have statistics on these columns which are at the moment missing so that's one of the warnings but apart from the bond warning what I also want to show you that if I take the cursor over the arrow there I see actual number of rows as one an estimated number of rows as seven seven five now this is what I mean by cardinality estimation which is estimated number of rows is 775 and actual number of rows is 1 so the 775 is a default guesswork by the optimizer and there's a big difference between 775 and 1 and that's because there are no stats so optimizer definitely does not have any data to do any computation on which is really which is where things are going bad now I will turn on auto create stats let me turn it on and then when I run this query again of course I get that one record which matches the criteria but when you look into the execution plan the first thing you will notice is now you get a different plan altogether you get a table scan but you also get parallelism which is where the optimizer has chosen an efficient plan based on statistics now statistics do exist before I show you the stats object let me take the cursor over the arrow and show you the cardinality estimation now you can clearly see that actual number of rows is 1 an estimated number of rows is 1 so statistics here is doing an awesome job and cardinality estimation is perfect you may or may not always find perfect cardinality estimation because that's it end of the day help with approximate value and they may not be always 100% accurate and that's how mathematics and stats as a subject go when you look at the stats object now let's go and look into the stats object and you can see that three statistics objects are created now this is really interesting because you had one query but you have three stats object and I'm sure you have by this time figured out that three stats object are three objects are created because your filter you were filtering on first name middle name and last name so you had three columns did and there is one stats object for each column respectively if you look into the LP SP help stats you can see the keys there so first name middle name and last name this also brings us to a very important thing to note here that sequel server can automatically only create single column statistics it cannot or does not create multi column statistics if you want multi column statistics either you can create them on your own with create stats statement or when you create a multi column index so sequel server to support that index would create multi column statistics now because these stat objects were created automatically the up the engine cycle server has given some system names to it so you could see Louisa's triple some five six seven look at the order there and some hexadecimal values so this is an auto-generated name five six seven here represent the ordinal number of the column so this is fifth column sixth column and seventh column and this is 5 0 9 0 EF d7 is actually the hexadecimal value of the object ID and this of course stands for system generated stats and W a as some people say stands for Washington I don't know how true that is anyway now you could always verify that so if I do a select star from person dot person - you can see 1 2 3 4 5 so 5th column first name 6th column and seventh column what's the object ID for % dot person - ok it's 1 3 5 1 so if I just simply go and take the calculator and I will type one three five one six seven five eight six three and we do a hex of that 5:09 0e f d7 well 5 0 9 0 EF d 7 that's precisely was the hexadecimal value and you can tell you that sorry there yeah so that's how the that's how you could decode these that's name there man but that's not very important you need to understand how stats work and how how important they are ok so that's how you could see about the creation of stats so let me quickly summarize you should ensure that auto-create stats is turned on it's a good it's a best practice and unless you have some very very good reasons to turn them off but in most typical sequel server deployments you will find that auto-create stats is turned on and when they are on sequel server will automatically create statistics whenever the optimizer has in need of one and it can automatically only create single column statistics now I am NOT talking about indexes and stats behind the indexes etc and that's why I said we can have multiple parts to this video let me quickly talk about updating statistics now so as you know one these tats get created and note that the stacked object gets created before generating the plan because the because the optimizer of course needs these statistics data to create the plan and now over a period of time of course the data can change so when the data changes sequel server has to go and update the statistics but sequel would not just update the stats object every time in any time there is a very simple formula there 20% of the data needs to change 20% plus 500 rows that's the basic formula that has been put into the engine there are many many caveats to it some trace Flags some new improvements and enhancements and in some latest versions of sequel server I'm not discussing that right now in this video so I'm sticking with that basic principle of 20% plus 500 rows that need to change before sequel server can automatically update the statistics object so if I select from this table I can see that the total number of rows is 1 9 9 7 2 you can look at the right corner there okay so that's 19 thousand nine hundred and seventy two rows and if I do 20 percent plus 500 of that I get close to 4 4 9 4 so let's say let's say 4,500 rows so if I have in total 4,500 rows changing in might in my table then stats is going to be updated automatically now before I show you a quick demonstration of how stats are going to be updated let's go and look into the stats object once more and focus on some important columns out there so you know three stats objects were created this is the time when they will last updated so in fact they were they were never updated they were only created once as part of my demo here and you can see all of them almost has the same time which is eight hour 29 minute 47 and then some milliseconds out there now you can see there is a column here called modifications which means after this stack of these stats objects were created we did not do any modifications which means no data in any of these three columns has changed so the modification counter stands at zero now we discussed that we need to have at least 4,500 rows to be changing let me just go back and run this again to get the number which is four four nine four at minimum you need this so just to play around with this I am trying to update now all the records not all the records four four nine zero records because business entity ID as I have seen the data is the serial integer of one and goes all the way to 19,000 something so I'm trying to update 4490 records which just which is just slightly less than 20% so when I do this update I see three six eight five rows changing actually not four four nine zero which means not enough data but any which ways and then when I run the stats object DMV query I can see that three six eight five rules have been modified so that's the modification counter now remember this modification counter is actually based on on the column data which is one of the column data has changed which is first name right so this is this one now you will notice that a fourth statistics object has been created and by this time you would have understood why that has happened it's it has happened because of of this first column which is if you see look at one here which is because you ran this query which was based on business entity ID so now that your query has a predicate a filter on business entity ID sequel server automatically creates a stats object for this so let's continue with three six eight five but you know you actually need four thousand five hundred years I didn't really need to update a few more rows so now what I will do is I am simply going to run my query where is my query by the way yeah this is the one so if I if I simply run this query I mean you don't get any out because I changed everything to last name as one cell but after you run the query and you look at the stats object again and you can see absolutely no change the that's timing if you see is zero eight 29:47 is still the same stats objects have not been updated so you really got to have 20% plus 500 rules that that needs to change so what I can do is just manually update maybe let's say another thousand rows okay and we will fire and let me change this to let's say I'm at two and we execute this and this 645 645 is not going to be good enough so I will change this to I'm at three another few modifications that sounds good now how many rows have we changed so let's go and look at the stats object in total for nine seven five four nine seven five is good looks like so if I just put four 975 here is it more than 20% for nine seven five of 19,000 772 that's a good 25% of the data that has changed good enough for stats to update stats to kick in now the most important thing to note here is 25% of the data has already changed but stats object has not been updated so you can see that the last updated time is still 8 29 47 it has not been updated now this is where many sequel several professionals have confusion that you know my data has changed and it has more than 20% 30% or 40% but stats objects are still not updated well the stat objects will be updated only if the optimizer needs that particular column again or that particular column is encountered again in any given query which means until and unless I don't run this again or I think whichever column was it first name until analyst I don't use first name in any of the queries the static objects are not going to be updated now when I run this query what sequel server will do sequel server will hit this particular column and it will see that the modification counter is more than 20% plus 500 records it's going to update the statistics and then with the updated stats data it's going to generate an execution plan so when I execute this I just press that five I executed this and you can see okay no data that's fine but now when you look at the stats object executes and now you can see that it's all back to zero because stats has been updated and this one got updated which was our column number five first name and you can see it's time now it's eight forty one zero two and the others are whatever they are 841 so this is how you see that automatic now just like you know this property that we had which is auto create stats we have auto update stats and that is also turned on I'm sorry what's wrong yeah auto update stats so this is also turned on by default so both or to create stats and auto update stats are turned on by default and it's a best practice to keep them turned on but in a couple of sequel server deployments I have seen that auto update stats is turned off because some DBS tend to updates that manually using some maintenance plans or weekly or daily or bi-weekly scripts which is fine you know your environment bests how to deal with it but in this video I just wanted to show you how the basics of statistics how they're created and how they are updated do I have anything more okay not much let me jump back to the presentation well so I hope this demo was useful and the intention was to just do quick basics about sequel server statistics if you're interested you can always visit sequel mistress comm and connect with us on Twitter Facebook or YouTube and I am personally available on a midburn Soldotna my twitter handle is a underscore Munsell and sequel mistress is at the rates equal miss rose oops yeah and of course I talked about some of our training and hands-on lab etc just drop a mail to classes at sequel maestro's dot-com and someone from our team can get back to you in case you want to explore learning more from us I want you to pay some attention to this hands-on lab project which we started two years back today in this library we have more than 120 labs each lab document consists of three to four exercises and it's a long document of 50 to 60 pages and it expand you can practically learn a particular concept step by step it's it's a great stuff go ahead and create a free account for yourself and try out some of our labs we try to cover the length and breadth of Microsoft data platform not only sequel server so you have as your stuff you have machine learning you have data science stuff you have as your data factory you have cosmos DB and of course latest versions of sequel server and and lot more so give it a try I also invite you to join Data Platform geeks comm videos like this recorded webinars free in-person events and live webinars there's lot out there as a member that you can benefit from so do visit Data Platform geeks.com and become a member in case you have more questions please join these groups there's one group that we have on Facebook and another one on LinkedIn and a lot of experts out there will answer your questions and you're most welcome to answer questions yourself in case you no answers from from the community questions well before I wind up and the quick reminder for you to visit DPS 10.com and if you if you visit DPS then if you are joining this conference and if you learned about this conference from this video please come and meet me or our team and let us know that you learned about this conference from our videos thank you very much for your time I hope this video was useful in your learning endeavors I'm available on twitter at a underscore bun cell do follow me there and hope to connect with you and next video thank you very much and you have a great day
Info
Channel: SQLMaestros
Views: 21,862
Rating: 4.8255033 out of 5
Keywords: sql server, microsoft sql server
Id: nqAOF1DFpyU
Channel Id: undefined
Length: 26min 33sec (1593 seconds)
Published: Mon Mar 05 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.