SQL Server Index Tuning – Multi Column Seeking (by Amit Bansal)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi today we are going to talk about index tuning it's a huge topic so I should not get overwhelmed when I'm talking to you about indexes today I will specifically focus on multi column seeking when you create a non-clustered index as you might know it is a separate storage object and by the virtue of being a b-tree structure it has the root page the intermediate levels and the leaf level when you create a non-clustered index you want sequel server database engine to traverse through the index and seeking capability is one of the greatest performance benefit of creating the non-clustered index now as you know index by itself does not improve performance it just helps you get to the data faster and seeking is one way how you can get faster to the data so you traverse from the root page to the intermediate level and to the leaf level in many cases the optimizer might decide not to see but to scan which means you are scanning the leaf level of the data and when you scan you are touching all the records from you know the first page to the last page and of course scanning is going to be much more expensive than seeking now these are general stuff that I am talking about there are always exceptions to these cases that I'm mentioning and because the topic is so huge and sometimes the workload patterns the queries the different indexing strategies that you can do there they're so complicated that sometimes you just there are all these specific cases and exceptions but these are generalized stuff that I'm talking about in most cases are this is applicable now specifically when I talk about this multi-column seeking when you create a non-clustered index you could create a single column index or a multi column index single column means your index key is just one column so you let's say you create a non-clustered index on email address and you know put email address in brackets so this is a single column in index when you create a multi column index something like create non-clustered index on first name comma last name comma email address now you have three columns so your index key is actually a combination column one column two and column three now remember when you are searching on this index key value it's a left based subset so searching happens from left to right and in a single column index things are pretty straightforward you just have one column value to seek on but when you have multiple columns then things get a little more complicated so going by the rule of sequel optimizer here you cannot seek on the second column if you're not seeking on the first column now these are the tricky things that I want to show you with seek predicates and residual predicates etc and maybe this quick video and and the demo that I am going to show you will help you design better indexes so let's understand more about this concept with the help of a demo and as usual no more slides let's jump to the demos straightaway so in this demo I am going to use the adventure works database let's use adventureworks and there is a table here called person dot contact if I show you all the records of person dot contact that is first name middle name last name and then there are two columns email address and email promotion of course a couple of other columns that I am NOT going to use in this index demo I will use some of these columns out here back to the code let's make a copy of person dot contact as contact to so the moment you do this select star into you know that now person dot contact to so person is my schema contact to is a table and this is now a heap so there is this heap which means there is no clustered index on this table and of course there is no non-clustered index as well now first things first we will create a clustered index on this table and the column is contact ID which means now the data is going to be rearranged all the pages are going to be rearranged and are going to be sorted physically by this column contact ID so a quick reminder you have this table contact ID and now you have one and of course the only clustered index on this table now here is the query for you now this is like a small mathematical problem you have this select statement and you want to retrieve contact ID first name an email promotion from this table and you have filters rather predicates on all these columns first name like l % email promotion is equal to 1 and contact ID is less than 10,000 so mind you there are approximately 19,000 records let's for the sake of discussion assume that there are 20,000 records and these are the three predicates option max top one can be ignored and just for the sake of demonstration now if I do a help index on use the stored procedure to see what currently exists on contact 2 and as I said you just have the contact ID only the clustered index their zoom thing did not work so well anyway now let's look at so you have to do so what is the problem the problem at hand is that you have to design a non-clustered index for this query now remember there are there are tons and tons of best practices when it comes to designing indexes in sequel server and of course the first one is you should never design an index for just one workload or for just one query you should try to see that a particular index should benefit multiple workloads multiple queries but remember this is just a demo and I'm just trying to show you something about index tuning and about how the sequel query optimizer works so for the sake of this demo let's design a non-clustered index that gives maximum performance for this query and a reminder again you already have a trusted index on contact ID so what I did I have created a few indexing options and from option 1 all the way to option 13 I have created a couple of permutations and combinations that you could use to create on to create an index on this particular object contact ID now let's go by quickly by the round of elimination and I'm going to do that very very quickly remember you already have a clustered index on this table which is contact ID so when you create a non-clustered index the clustering key is automatically included in every non-clustered index remember even if your table is a heap the row identifier is automatically included in every non-clustered index so I need not create a non-clustered index and include contact ID there so that's the first thing so the second thing is a first name and email promotion so these are the only two columns I am left with if of course I don't have to use contact ID in my index key formation because it's already a clustered index key so I am only left with first name and email promotion now first name an email promotion these two columns the question is can I seek on these columns so if I look at the predicates first name like L percent can I seek on it yes I can seek on it if this was something like percent L then of course I would not be able to seek on it so this one I can definitely see it on and then email promotion equals to one can I seek on email promotion with this predicate yes I can see so I can actually seek on both first name and email promotion and what about contact ID well as I mentioned before contact ID is already included by default in the non-clustered index because that is the clustering key so I'm left with these two please first name an email promotion and if I can seek on both of them then I can be sure that I need not put them in include keyword now what I mean by include keywords so let's go back to the indexing options and you can see that let's say let's look at the this one this particular index option 2 I'm creating a non-clustered index on contact 2 and michaelis mine on the string key is first name this is my index key and I include email promotion this means that I am telling the storage engine that email promotion as as the column values should not be there in the root page and the intermediate pages it should only be available in the leaf level which means this is only for the purpose of retrieval typically we use this include functionality for covering index purposes which means us some of these columns on which you don't want to seek you can put them on include so that they are available for retrieval which makes this index as a covering index a covering indexes and index which covers the entire query and it might just cover one query and not others but as I as we discussed before these two columns the first name and email promotion we can actually seek on both of them so why put email promotion in include we should get benefited from the seeking capability we should take advantage of that so let's go by a quick round of elimination option one I don't want to use include I want to put both of them in my composite key formation so this is also called as a non-clustered composite index because I'm using multiple columns again include I don't need because I want to put email promotion there in the key formation first name I can seek on first name I should not put include again the member include again I'm such Google and you can learn more about include remember any column that you put and include you will not be able to seek on that column because that column value is not contained in the root page and the intermediate page contact ID I don't need contact ID in any of the indexes so wherever I see contact ID I'm just going to get rid of those options remember I have taken all possible permutations and combinations here and the moment I see contact ID I just don't need them do I need to create an index like this so these are all now composite options composite one two three four and five and first name email promotion contact ID so as I said I don't need contact ID so I can get rid of this option ten again I don't need option eleven again contact ID I don't need this leaves me with the two options now composite four and composite five composite four is first name comma email promotion and composite five is email promotion comma first name so really the difference between the two is the order of columns now remember the order of columns in your select query in your select list does not matter but the order of columns in your composite index key formation it does matter and it is very critical so first name comma email promotion the index key formation is not the same as email promotion comma first name now given these two indexes but of course both of them are in a way covering indexes because contact ID is already automatically part of both the indexes and your of course including first-name enemy and promotion also and then you can can you also seek on both of them so the point is let's go and create both a first name and email promotion so we can we'll create composite for as well as composite five and you know what let's go ahead can create both of them and let's see and let the optimizer choose whatever it wants so I'm going to go ahead and create both the indexes composite for and composite five gets created now remember composite for uses first name comma email promotion that's the index and composite five is email promotion comma first name there you go and this was the query and now let's go ahead and run this query and I will include actual execution plan to see which index is being used and I hope the zoom in works properly yes can i zoom in here okay fair enough I can zoom and I can show you a few things let's go and execute this is the climax let's execute and you get the data out and if we go into the execution plan you have the seeking here and what which object is being used here so the optimizer actually uses contact composite five and you can see that it does not use contact composite for now why has the optimizer chosen contact composite five over contact composite for first things first will the optimizer get and or will sequel server database engine get the same performance if it had to you contact composite 4 or is it that it has just randomly chosen contact composite 5 or is it the most recent index that we have created and that's why it gets chosen which means 4 and 5 both the indexes are same in terms of the searching criteria because both the columns are being sick the upon and both the columns are present in the index as well a lot of questions so why not do something we take this one and we take this one too and let's put them next to each other here and in this query I am going to force contact composite 4 with with 5 because this one is the optimizer is going to choose 5 and Here I am forcing 4 and is the performance difference same from the perspective of optimizer let's go and execute and execution plan is already turned on you get the data out and if we go into the execution plan and now the most interesting thing is here you are seeking and you get contact composite 5 and this one uses contact composite 4 because you use the index hint and you force the use of this index what about performance so if we go and zoom in a bit for me to show you the performance there you go so 5 used 35% and that's the performance and this one is 65% so that's the relative cost for the entire batch out of hundred percent 35% is being used by composite 5 and 65 by composite 4 so that is clearly a performance difference between 5 and 4 so what's really going on well let's go and try to understand the rule of the optimizer here in contact composite 5 which is let's go back and look into this one email promotion comma first name so why has sequel server chosen this one over for remember look at the query here the query email promotion uses a equality predicate and that makes all the difference here so the rule of the optimizer is you can seek on the second column only if there is an equality predicate on the first column going by that rule the optimizer is able to seek on both the columns first name an email promotion because contact composite five email promotion is the first column so email promotion is using equality predicate it's the first column so it can seek on email promotion and because it's an equality predicate it can seek on the second column which is first name in case of contact composite for where first name is the first column and it does not use the Equality predicate it uses the like operator it can seek on first name but it cannot seek on email promotion so seeking on the first column which is the C cable predicate email promotion equals to one in case of contact composite for actually becomes these residual predicate so residual predicate here is a predicate on which you are not able to seek and that's why I use these informal terms that the contact composite v is a fully c Keable index in case of this query in case of this example and contact composite fort is partially c cable and just to repeat again the optimizer is able to seek on both the columns because there is an equality predicate in the first one here email promotion and in case of contact composite for because the first name predicate does not use an equal to operator it can seek on first name but cannot seek on email promotion well this topic is much much more bigger than this quick demo and it is very interesting and I would have actually turned on statistics time and IO to actually show you the read performance and the execution time performance as well but yeah that's good enough for you to get the idea that yes in a multi column index there are some of these rules that we need to understand and how you can actually make your indexes fully seek able to get the best performance out of them remember this was a quick demo only focusing on one part there are as I said before there are many best practices and we just cannot generalize because indexes is such a thing that there are exceptions to everything that we are talking about and in the sequel community and family we always say it depends just as a light moment you know sometimes when there are so many scenarios to deal with so yeah every every scenario every example has to be taken in its own entirety to really understand the concepts well hope you like this one hope the devil was useful and you have learned something new here are a few bullet points about me in short I have been working with sequel server since 7.0 it has been more than 20 years and I still love working with the product there is a URL their sequel maestro's com slash amath - bun cell click on that and you can read and learn more about who I am and what my work is here is a quick snapshot of my work and some credentials I am a Microsoft Certified Master of sequel server also Microsoft MVP for sequel server and Microsoft has also honored me with Regional Director status I do spend quite a bit of time with sequel mistress in fact this is my day job and we have a very popular performance tuning video course a wonderful product hands-on lab learning kits and of course a popular sequel server he'll check service just visit sequel mistress com to learn more about these offerings apart from my day job I also spend time with the community I spend my time with sequel server geeks and data platform geeks we do lot of webinars and in-person events and one of our flagship initiative is data platform summit this is an annual conference which happens in Bangalore in the month of August and you can visit DPS 10.com to learn more about this a quick word about performance tuning video course so what I did was we had a popular master class on sequel server internals troubleshooting and performance teaming which I converted into a master series with more than hundred hours of deep dive content expert level modules both in terms of breadth and the depth and it is constantly updated you can subscribe to it and watch anytime anywhere as many times and the labs you can access them via sequel maestro's hands-on lab that bitly link bit least /a b video course is where you can go and learn more about the video course or just go to sequel mistress calm and you can explore the video course all the links are available in the YouTube video description a quick word about the annual conference as well if you will see this about Microsoft Data Platform as your artificial intelligence big data are cloud and of course including sequel server do not miss this conference this is one of its kind world's best Microsoft product engineers MVPs and M CMS come from more than 20 countries and they assemble in Bangalore in the month of August and deliver more than hundred sessions across three days and we also have full-day classes called data platform summit pre cons so this is a place where you should be visit dps 10.com to learn more about dps here is a quicker organizational structure of indominus equal Mistral's is something that i spoke about this is on the community front and there are other units expand sm erp which is an award-winning ERP solution from a dominar and our corporate training unit people were india call to action you can subscribe to this youtube channel of course you should do that you can follow sequel maestro's on facebook twitter linkedin it's just sequel maestro's and you can follow me on twitter a underscore bunsen and you can follow me on Facebook and LinkedIn as well the URLs are up there on the slide with this thank you very much hope this video was worth your time see you soon in another video
Info
Channel: SQLMaestros
Views: 12,084
Rating: 4.8766518 out of 5
Keywords: sql server, microsoft sql server, SQL Server Index Tuning, Indexing Strategies, Query Tuning, SQL Server Internals, Monitoring, Performance Tuning, Performance Monitor, SQL Server Administration
Id: nUpZRx7Sgdg
Channel Id: undefined
Length: 21min 32sec (1292 seconds)
Published: Mon May 13 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.