Working with Very Large Tables Like a Pro in SQL Server 2016 Part 2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm recording good afternoon my name is Julie Bloomquist and I'm the moderator today for the past database administration virtualization virtual group today we're going to have part two of working with very large tables like a pro and sequel server our speaker is again guy Glanzer this presentation is being recorded it will be posted on the virtual DBA chapter archive page starting tomorrow we are part of the pass organization there's past be a day coming up in San Diego you could register and there's still a discounted rate of 475 the past summit is coming up October 31st to November 3rd in Seattle Washington and if you register with the code VC 15 ps6 you'll receive $150 savings like I said we are one of the virtual chapters there's a large number of them you would just go to your pass account and associate it with another virtual group or even a physical chapter you will only receive emails about the meetings that are coming up 2017 past directors at-large elections are opening up there's three seats yeah and you can go if you're interested you can contact the board at the governance at pass org passes looking for visionaries to serve on the past board you can also contact them about the possibility we are also looking for nominations for the passion award please submit that between July 8th and September 1st these are for outstanding individuals who volunteer here and who are passionate about past and sequel server sequel Saturdays are one day many conferences and these are the ones that are upcoming around the world usually there's just a nominal fee to cover the lunch so here again you can join a local in-person chapter or you can associate your account with the physical chapters there are a large number of volunteering opportunities with pass and you can also submit nominations for the volunteer recognition for the outstanding volunteer Awards if you discovered the DBA virtual chapter on YouTube that's great but we would love you to become a past member it's free and like I said you will only get the emails about our meetings and you'll be able to download the supplemental materials you know for many of the scripts again here is how you can stay connected with pass and once again we have guy Glanzer he's the data platform MVP he's a leader of the Israeli past local group and is also the CEO and founder of Madeira Data Solutions his career has been focused on the Microsoft data platform for over 20 years performing various database roles as either an on-site DBA an external consultant or a speaker guys involved in many activities on the Microsoft data platform community he occasionally speaks at community events such as the past summit sequel bits sequel Saturday and user groups around the world he co-hosts the sequel server radio podcast so I'm going to change oh here you go guy okay guy you should be presenting and I do see your slide deck excellent Thank You Julie and hello everyone so my name is guy and this is the part two of working with very large tables like a pro in sequel server 2016 if you didn't attended the first part so let me just say a few things one of them is about the sequence of radio so if you haven't heard about sequence of radio yet because of radio podcast is something that I'm really proud of me and Melanie Altman we talked about sequel server communities and professional opportunities projects that we are involved we have some people that we interview once in a while so it's really nice if you're interested in sequel server and I guess you are otherwise you wouldn't be here turn into sequel so radio.com from the previous part of this session I got some really great feedback things like excellent this was very helpful I look forward to the continuation of this topic which is now I enjoyed and learn from it thanks demos you may think some certain partitioning and more like this so it's really thank you very much for the physics it's really a warming and it gives you motivation to do this thing more to speak to people about topics and to teach people to get this great feedback so thank you very much and I want to go quickly to do a recap of what we did in the first part and then move on to the second part today so the agenda was to talk about very large tables in sequel server how we work with them how we do it like a row how we do it efficiently and to talk about some new features in sequel server 2016 and also 2014 that enable us to do it better we talked about what is a very large table is this a very large table or maybe this one the answer of course is always it and my answer is a very large table is one that requires special care meaning if you need to do things in a different way the way you manage the table the way you operate your different processes maintenance for example in your data rate when the table becomes large at the point that you need to do things in a different way this means it's very large table at least for you so for one person in one environment a side gig tables might be considered a very large table for another person in another environment maybe just a 5 surabaya table will be considered a very large table so it's wind defense and when I talk about very large tables specifically in this demo we have the page views table what I mean here is the table that has some kind of ever-increasing column like a date/time column it's a log table or events table in this case it's the page views and it gets lots of insert rows are inserted very quickly into the table and becomes larger and larger all the time and it has some challenges in querying aids and managing and maintaining this table so in this case we're going to have the page use table when we talk about working with very large tables I talked about four aspects of working with it one is to manage the table query the table update it and when I say update I specifically mean loading data into the table in most cases those tables are insert only there are no updates sometimes there are but I'm going to talk about the inserts in this case and also how to maintain the table we talked about managing the table in the previous session and we specifically talked about table partitioning it would be very wise to partition your zero large table it's facilitates data archiving scenario this system the main reason for partitioning if you want to archive all data from your very large table either putting the data side to a different location or just delete all data it will be much easier to do it with partitioning with the operations of merge split and switch partitions and there you can just switch let's say the last date that the old estate your table assuming you have a partition birthday it could be just the metadata operation no resources no downtime no performance issues very easy to to do you can spread partitions between different storage tiers if you have for example slower distance after this you can put your recent partitions your hot data on the first disk and other partitions on Florida's because each partition potentially can be located on a different file group so this is another thing you can do it also improves table maintenance we will talk about maintenance today we'll see some features related to partitioning that you can use for table maintenance and I also said that partitioning does not necessarily prove great performance and I got some feedback about that so yes if partitioning was originally designed to facilitate mate management of the table and maintenance of the table not to improve query performance and in some cases you might get improvements to create performance in some other cases you might even the great performance when you use partitioning it really depends on the query how you write the query in your the table design how it is structured but in most cases at least for my experience you will not get query performance improvements and I see a lot of clients other people implement partitioning and then they are disappointed because they don't get the great performance they expect but you shouldn't expect great performance improvements by using partitioning you should expect great improvements in how to manage and maintain your table and I'm going to show that later one of them of course is the again D the sliding window the archiving scenario that I talked about earlier we also talked about stretching tables to the cloud using the stretch the new stretch DB stretch table feature in SICU Server 2016 and what this means is that you can take a table and Nabal it for stretch table stretch DB and it will create for you or you can create in advance a database and other sequel database in the cloud in a sure and then sequel server will take the all data based on your policy on your definition of what all data means it was the old data and a synchronously transfer it to the cloud to the database in the cloud and then you have only the recent data the hot data locally and the cold data in the cloud but the nice thing about it is it's completely transparent of the applications so you can query your table as if it was all locally but in assuming that most of the time you just want to access the health data you're not going to feel the difference once in a while if you need to access code data is going to take it from the cloud so you have some latency but it's transparent and it works very well very easy to implement it so in order to set up the stretch DB feature you need to run the stretch database advisor first we don't need to but it's recommended this will tell you what our tables are candidates for stretched table and if you have any limitations you need to enable remote data archive at the instance level you need to create of course the other server in the cloud or choose an already existing server let's create a firewall to allow access to your sequel server to the cloud you need to create a database master key locally this master key will encrypt the credentials to access the other sequel database the server and then you of course create a database code credential to access the server you need to enable remote data archive at the database level as well optionally you can create a filter function the filter function will tell sequel server what data is considered code what data should be moved to the cloud if you don't create a filter function then the entire table will be stretched into the cloud it's also an option and then you'll never remove that archive at the table level and you're wrong you can you can go with fresh TV so that was the management part from last time we also talked about querying the table and specifically the ascending key problem when I talk about the ascending cube problem what I mean is we have this ever-increasing daytime column and let's say the table contains seven years of data and most of the time users only query the last week so we have an issue with statistics because at some point in time statistics get updated so everything works well but based on the default read sort of sequel server before 2016 it was 20% you need to have 20% of the rows modified before you have out to update statistics in a database by default and let's say you have a 1 billion rows table you need to have 200 million rows modified before you have an update sophistic this might take a while so after a week according to statistics in the last week there are no rows at all because it's not updated if it was only updated last week and when users query the table for the last week the optimizer gets it wrong it has to make 0 rows in the last week while in reality there are maybe a million rows or whatever and then you get poor plans for performance and it's a very common problem we talked about some solutions one of them is to have the statistics manually don't rely on the up auto objects mechanism of sequel server you can update statistics manually can have a job that does this say for example this might be a problem if you want to do it with a full scale and the table is really very large it might be too heavy might take too much time or too much too many resources from the server if you do it with a simple another food scan it might not be accurate enough on the other hand it might work so that that's a viable solution you can use the for 2016 you can use trace like two three seven one this changes the behavior the threshold of 20% to a dynamic ratio that gets lower as the table gets larger so for very large tables the percentage will be much lower and the frequency of also of the statistics will be much higher in 2016 this is the default behavior says you don't need here the trick this trace like anymore before 316 I recommend you this press like always I think that in all cases this was better than the default behavior of 20% since 2014 we have a new cardinality estimator which is the component within the optimizer that the number of rows in the plan in the query based on the statistics and it was rewritten in 2014 it was also improved in 2016 so one of the things that was improved is the ascending key problem so now the new cardi at the estimator will recognize the pattern of the ever-increasing column and with it will not just rely on statistics alone it will try to estimate how many rows are in the last week even if the statistic says there are zero rows okay so it will not be as accurate as an updated statistics but it will be much better than just saying zero so that's another thing that you can get just out of the box when you upgrade to sequel server 2014 or 2016 I also said in the last session it is important this new identity estimator has some drawbacks in some cases you might get worse plan than before so you have to be careful here too when you change the compatibility level of your database to 2014 or 2016 you have to monitor your queries performance and make sure that nothing bad happens and it's something that happens then your course you need to either use a trace flag to revert to the previous capillarity estimator or optimize the query and make sure that it runs okay we talked about filter statistic this is something that exists since I think sequel server 2008 so what you can do you can dynamically create a filter statistics only on the last week so this was run much faster it will be much more accurate and before 2016 as long as you use hard-coded values sequel server can recognize that the filter statistics covers the the range of values you need and it will use the filter statistic and will be will give you much better estimate in 2016 it's even better you don't have to use hospital values you can use non-deterministic expressions you can use parameters and in all those cases sequel server now in 2016 can use few statistics which is a great improvement of course you need to maintain this solution you have to create a job for example that runs daily drops the previous Twitter statistics and create a new one for the last week and so on okay but this is a it requires some development and maintenance but it works very well as it's a very good solution and I think this is where we stops the last time and we'll continue to talk about incremental statistics which is a new feature since 2014 and then we move on to the TV era topics are there any questions so far during J hold on technically partitioning can improve performance if the partition key column is used in the where clause that's from Tim Cartwright yeah so again I know in some cases it can improve performance I'm not saying it will never improve performance I'm just again from my experience in many cases clients are disappointed because all the expects is to improve for a performance and they don't get improvement they expect this is not what partitioning was designed for originally that's it for questions okay great so this is was just just at the recap of the previous session now let's go to the demo okay so in the demo what I'm going to do now is to simulate again the problem I'm going to delete rows from the last week then I'm going to update statistics but now the new feature is this when I update statistics or when I create sistex manually I can add incremental equals oh no this is an incremental statistics and what this means update what this means is that now statistics is fair partition okay so this is another great benefit of partitioning again not for a performance improvement but well you know it will result in query performance but the future now is that that statistics is now updated spare partition is still a one single fixed object for the entire table you don't have now a single separate statistics of for exploitation is just one object for the entire table but the two things that now are change one of them is that statistics can be updated pair partition so you don't have to update the entire table you can only update a specific partition which will be much faster operation but it will it will merge the data for that partition with the insider statistics table the other thing is that the threshold of 20% for example now operates on the partition level so as soon as you have 20% of the road in a specific partition that are modified you will have out to update statistics for that partition ok show you now we have now read the statistics and it's updated as of last week because I deleted rows from the last week ok now when I insert rows in the last week so we go back into the future previously in the previous session we saw that it didn't trigger of the statistics because we had that 20% threshold if we look now I run this query all the time give me dat rows from the last week ok if we look at the execution plan then now we see a good estimation here okay compared to the actual number of rows whereas before we saw just one row because the tip sticks were is updated and the reason we have good estimation now is because statistics is now updated we can see that the last row is now mr. day ok so now the threshold of 30% is per partition it because we inserted rows for the last week we actually inserted a lot of pros to the last 7 partitions and within each partition which was originally empty of course we exceeded the threshold of 20% so we updated all the partitions but only those partitions not the entire table so it's much more accurate now if we insert just for example three hours from yesterday because what I'm doing now so if you remember the numbers more or less you can see that the number here has increased by the number here hasn't changed okay and the reason is that three hours of data in this case are not enough to update the statistics to cross the threshold of 20% if we look now with all the partitions in the table so these are all the partitions we have 181 partitions and the number of rows in each one and what I want to show you is that partition 178 is yesterday right yes 179 is today and this is tomorrow okay okay so for example what I can do with in the statistics is to only update statistics for yesterday and this is the syntax with repetitions notice that you have to write with resample because you can change the sample rate for the city partition because it's part of the single statistics object so what this statement does it's now update statistics manually for only a single partition okay so even if you can you don't want to rely on the outside of the statistics when you use incremental partitioning you can have now let's say a daily job it updates only the recent partitions week full scan if the original simple expose can this will be much more faster and accurate so now you can see that but we can see it actually but it was updated and the Korea now has should have a better estimation yes a bit better than before it was 191 I was 183 and last thing when you you create a sadistic sorry when you can change the property of the database to set out requested a fix on with incremental equals on okay so you can instruct the database that when it's quick it's automatically create statistics on a partition table it will create them as incremental statistics so this is to summarize the all the options to improve query performance regarding the ascending key problem and okay so the next section is that allowed to do we have any questions about this no question okay right so the next section is the data load and let's look at here it is so what I'm going to do now I'm going to simulate insert into the table I'm going to do it a million times okay so I'm going to begin a transaction insert a single row into the table the page you staple then commit the transaction and do it again and again and again let's run this and I want to look at the several things first let's look at perform performance monitor and I want to look at sequel statistics batch requests per second this will give me that the throughput how many batch requests per second and databases we have log flushes per second of course I need a very large table demo database okay so this is the that's me okay is the log flush per second and now batch requests per second you can see it okay this better now so but if we go to a second we see something like 2000 which requests per second okay and if we look at something else I want to look at the way statistics so what I'm going to do here this statement clears the OS Wake starts DMV okay and then I'm going to run this select statement it selects from VM so it starts and first us out all the non interesting weight types I give you the top weights on a server so we can see that by far and we have the right log weight type models anything else on the server right now okay and what the right log weight site means it's a sequel server when we commit the transaction it has to take the transaction log content from the log buffer and flush it into the transaction log file on disk and this is when we have the right log wait time this creates some latency we have to wait for it to finish writing to the transaction log file on disk and it is the main problem when we have a lot of inserts statements a lot of inserts into the table like in this case like in many cases then we're going to have a lot of write log Waits and this will create a latency and this will affect a force that the throughput the batch requests per second that we can have if we want to increase the throughput we need to see how we can increase increase the load there are all kinds of ways to do it one of them let me just stop this okay so one of the option yes the right log boy type what are the options to improve it is by using the laser ability this is a new feature in sequel 7 2014 and let's let's talk about it for a few minutes delayed durability talks about the durability of transactions so with transactions we have four properties atomicity consistency isolation and durability durability means that when you commit a transaction sequel server will guarantee that the content of the transaction is durable on this it's persisted so if something goes wrong right after you get a commit your okay the transaction is safe when sequel server does recovery and start up again it will roll the transaction forward and you'll have it and you will not lose anything so this is a durability when we have fully doable transactions before sequel server 2014 here is what's going on we have a request and insert and update whatever we should request let's say inserts and what happens is we update the relevant data page in the buffer pool and we also write a transactional entry into the transaction log buffer in memory first okay now this is in memory as soon as we write a convict we first need to flush the transaction log buffer into the transactional file and then we have the rights log void type and only then we can get a commit okay so let's say we have another statement it's in the buffer pool which is in the buffer we issued a comic statement this the content of the transactional buffer will flush into this with the right log wave type we get a commit third time right load ready commit and then at some point in time we have the checkpoint process which will flush all the dirty pages from the buffer pool to the data fault but it is a different process okay so this is how a fully durable transaction performs we delayed durability okay with will have transaction now that has a delayed durability feature enabled what happens is we issue a statement it's in the buffer pool it is introduction to buffer and we comics already so we get a commit of the transaction and this means that locks are released and transaction is completed committed we can move on before the transaction log entry is persisted on disk okay because it's delayed okay so now we have a different transaction never commit a different transaction another commit at some point in time for example when it's transactional buffers fills up okay so we have a checkpoint yes at some point in time we will have a slash of the inside transactional buffer into the transactional file but this will happen asynchronously no session will have to wait for the right flog by type okay no session wait for this because we get a commit instantly so this is how the ledger ability transactions work when does the log buffer when it is flushed to disk so one option is if there is even if my transaction is delayed delayed dribble if there is another full agreeable transaction in the same database as soon as that transaction commits the transaction log buffer has to flush into this this includes my data my entry okay so my transaction will also be flush to the disk we can manually issue the system so procedure CSSP flush log to instruct secret server to flush the contents of the lobe to dick if the log buffer fills up then of course we have to make room for emotion instructions it will have to have to flush the disk the size of the transaction log buffer is 60k pair database okay so as soon as this 60k buffer is filled up it will flush it to disk and every few millisecond if you have an idle server that does nothing okay so the transaction log buffer doesn't fill up every few milliseconds just in case Microsoft added this this option to flush the contents of the table for every few milliseconds so in a real production environment this doesn't make any difference because you have transactions running all the time transaction log of 60k will in most cases will fill up before the few milliseconds special if you are doing some tests on your laptop like I'm doing in this case then you might just run a single transaction and then you will not understand what's going on why the transactions are flat this is because the few milliseconds threshold okay so it will happen regardless of the number of transactions you can control the ledger ability at the database level it is either disabled which is the default for backward compatibility you can change it to allowed meaning that you can control at the transaction level in this database which one is delayed which one is not or you can say that the later ability is forced at the database level meaning all transactions in that database are now delayed if their base is allowed and of course you can choose which transactions are committed with delay durability is part of the convict statement and we show that in the demo in a minute and also if you work with the memory optimized tables in manorial TP then you can also control it at the atomic block the natively compiled procedure some transactions will always be fully durable even if you work with the ledger ability for example transactions perform operations against CDC or change tracking tables so those tables will always be friends action against those tables will always be fully durable if you have course database transactions or cross server distribution functions those will always be fully durable and some features apply only once the transactions are fully durable on this so for example production or application transactions will be replicated to subscribers only after they are made durable so as long as they are in the log buffer even if you get a comics in the publisher that they will not be transferred to the subscribers same goes for log backup okay so the backup operation will only backup the contents of the transactions in the transaction log file it was not read transactions from the log buffer so only as soon as the transaction is durable you will be able to pick up those from actions saying philosophy because of course locating is based on log backups and what are the benefits of using the relativity you get less round trips to the log file on disk okay so you only go once in a while once the transactional buffer is filled up for example this might be a great benefit you get let's write log wait and less latency of course show the transactions you don't have to wait for the commit committed instant which means locks are released earlier so you get better concurrency you get less blocking of sessions and of course the result is better throughput last thing is you have to remember that you might lose data of course so this is not right for every table if you have a table that you must have consistency and you must have all the data durable then of course this is not a viable solution but there are many cases of again all kinds of log tables tables that you can afford losing some data some time assuming that you'll have most data most of the time then this can be a great feature for those tables because most of the time all of the time actually you will get much better throughput much better performance of your updates and insert why you have the risk of losing data data maybe at some point so it might be relevant for some cases but of course you need to consider if it's okay or not let's go back to the demo so now I'm going to alter the database and set the ledger ability to allow and I'm going to do the same thing as before okay so issue insert statements one per transaction and of course now with the commit reduction I add this so this is the new feature in 2014 we delayed your ability equals on okay so let's run it let's go back to perfmon so now we can see that instead of it's what I think it was 2000 we are at more than 3000 okay so it's more than 50% improvement and that the percentage of improvement will of course depend on the the size of your transactions the rate of your transactions stuff like that so it can be even much more than 50% but we can see clearly that we have an improvement of the throughput and the number of log flushes per second where is it here okay so this gives me how many times the load was flustered if per second before in the previous example when we had the legibility and turned off the number of flops flushes per second was exactly the same graph as the throughput the search requests per second because each batch request in my case each transaction had to be flushed to this this is the the original behavior of sequel server but now we can see that we have much less low flushes per second and much more batch requests per second okay so that's a much better throughput and if we go back to the weights start so let's do this again okay let it accumulate for some time and we run this query again so now we don't see the weight log at all okay do it again the right log sorry we don't see the right log anywhere in here we have some other ways I suppose are not very important so no more where write log we get much less latency much better throughput thanks to the legibility let's roll back yes stop first go back okay we don't need this you know this can close this and we're back here okay questions about this yep so is the incremental option Enterprise Edition only the incremental statistics I guess well partitioning is enterprise only so it's part of partitioning it you can't use it without partitioning in sequel server 2016 service step one many standard sorry many many Enterprise Edition features are now available in standard edition one of them is partitioning which means you can also use incremental statistics great and Mark Mills mentioned that he sees where this seems to be applicable to both large tables and small sorry what's a typical I guess what you are doing in this last session section o but delayed your ability yeah yeah that's true of course the legibility can can affect small tables as well that the the important thing here is the the rate the velocity how fast you insert data it doesn't matter how large the table is right now if you insert data very fast you have lots of inserts of transaction then you want to be able to increase the throughput one of the options if you can afford it if you can afford losing data if you delay durability yes in that type of question okay great so the next last topic the last aspect of working with very large table is maintenance okay so part number three we have is index reduce we need to rebuild our indexes on a very large table it might take a very long time so before sequel 7 2014 we could rebuild all partitions offline ok meaning we have the table is kind of a downtime the table or the index that really cannot access the index while rebuild it and it will it will take a while if it's a very large table we can reveal all partitions online ok which means if we can access the index during the regular operation it will take more time it will take more resources but we have better concurrency we don't have to wait until the index is completely rebuilt we can use the index while we are rebuilding it we could also reveal a specific partition offline which is a great feature because again if we have a very large table containing seven years of data probably the old data does not change only recent data is inserted or updated and there is no reason to rebuild the entire table all the time we can only reveal that the recent partition but we could only do it offline one of the new features in sequence ever 2014 is that now we are able to rebuild a specific quotation online okay which is which make that sense but we couldn't do it before another thing we can do this 2014 is to wait at low priority when we reveal the index online or when we switch a petition I will show that in a minute let's start with the rebuild indexes so this is how we do an index rebuild okay I'll alter the clustered index in this case I rebuild partition all all partitions online equal of this is something I could do of course before 2014 I have here think it's this one index repealed did I run it okay so you have all the locks let me make sure it's the right one yes okay so I rebuilt the entire index offline now I can specify online equals on steroids patience equals also now rebuild the entire table online okay so this will take more time because it has to maintain the index while it's rebuilding it and if there is concurrence in this case there is not concurrence even but if there is concurrent there is load on the server that then it will take some more time but eventually it will complete and it's an online operation and we can reveal the specific partition which going to be much faster because it's just a single partition but only offline again those are the things we could do before sequel 7 2014 so it seeks much much faster great select no time so again one thing we can do since sequel server 2014 is we can reveal a specific petition online ok so again it's a bit longer but still because it's a single partition it's nothing compared to the entire table see it's a very large table ok so this is one new feature now the next year feature let me start with the presentation is to wait at low priority and what this means let me go to the yes so I'm going to straight it here and then it will be multi or you can go to the demo so before 24 when we issue statements commands against the sequel's we have a normal key we have a queue of statements of tasks that needs to be run on schedules on CPU and for example the first time the first session runs an insert statement if we generated a queue and then because the queue is currently empty it will go into the scheduler the CPU in order to run to perform the insert now it's running in the meanwhile another session tries to rebuild an index online it will go into the queue now it has to wait yes it's online but in order to begin the operation it has to take a shared look on the table and it cannot take a shared look while there is an insert warning so it has to wait while the current insert statement will end before you can start the operation and then maybe there is another insert statement that's now wait in the same view behind the online index revealed ok so the next insert statement is also blocked right now although if there wasn't an online index revealed the next insert statement could run ok because there is no reason that to insert statements running parted they can run invalid but because there is company in the queue and online index revealed the insert wait behind their own index review so only when the first insert completes the online index rebuild can start the answers get of course now work in parallel to the online index rebuild or maybe done an index rebuild has only already completed so there is no issue now but the problem is that the second insert has to wait to the online index revealed to start and it cannot start right away okay now in 2014 we have this new feature wait at low priority so we can specify that when we were building a online index or when we switch a partition in both cases we can wait at low priority and then we have a separate queue for low priority tasks again and those are revealed online indexes and food partitions up so now the first insert go into the normal queue and starts running when the online index reveals comes along it goes to the low priority queue it has to wait of course they got the current insert statement is still running now the second insert goes into the normal Q and since there is no reason it can't start it will run in parallel to the first insert statement so both of the now can execute the only an initial build still wait because it's is its low priority so it now has to wait for both insert statements but first completes the second completes and only now the online in is revealed can start okay so if we don't want our online index rebuild of operation to interfere with our normal workload we can specify that it displays at low priority now let me go backward we still need to specify what happens so how much time the index reveals will have to wait at opera T and what happens when this time is over okay so there are three options if this time is all the index reveals can continue to wait it's donal priorities which will move from the lower with acute the normal queue at this time it can abort the operation aboard itself okay stop trying to reveal the index let's try again tomorrow something like that or after a few minutes we can tell it to kill all the blocks skill or other sessions that currently are blocking it and start the operation okay so those are three options let's go to the demo so the demo goes like this we start with session 1 okay so in session 1 2 the game transaction and we do an insert it is the first insert and if I go here I don't see nothing okay because it's not blocked yet now let's go yes this is one ok so I've now revealed the index online a specific partition in this case ok it's online wait at low priority now in its specified amount duration in minutes so I will wait one minute and avoid up to wait why do I do after one minute if I still if I'm still waiting ok so this is known meaning go to the normal queue and continue to wait it's normal priority okay so let's run this one and now we have Session two which is another instance this is the second insert ok I'm not in the right database let's make sure it started transduction in the right database perform the insert ok now let's see what we see here okay so now session sorry just want to show you that the Olin reveals index is 56 okay and now session 56 wait it's low parity so we have a new way type ok several new way types actually we see that it's weighted low priority it waits for both sessions 58 and 59 which are the two insert statement ok because now they are both running in parallel and now it's wasted so parity and after one minute which should be over now ok so now the weight type has changed because it's moved into the normal priority queue Orion continues to wait as soon as we commit the transactions first one and the second one only now it can complete the online index revealed and now of course we'll see here nothing okay so this is one option the second option let's do this again session one first insert then we have the index rebuild this time I'm going to specify a board after weight sells meaning I will wait one minute and after one minute I will avoid the operation I will stop trying to reveal the index and then session to perform the second insert if we look here you see the Saints the same as before okay so again we have to insert they're running no problem they are not blocked by anything but our session 56 is blocked its way at low priority and now what we're going to see after one minute is that this session 56 is 1 ok we'll we'll stop if you'll kill itself abort the operation so let's wait a few more seconds and then of course the the other inserts can continue to do whatever they want they are not blocked anyway if we look at this again think ahead of time here don't have the time okay look request time water it exceeded okay they are we going to get so just like look time out and it abort itself and the last thing is to the commit and you commit here okay so the last one again session one first insert then rebuild index that we first run it okay to get it going after both up to waste blockers it was blockers so now after one minute it will kill all others all other sessions blocking it in this case the other two insert statements okay let's no two session to and I want to show you what happens no committee sorry what happens when they are both it so again they're waiting at low priority so after you know one minute the online index reveals will be able to start and complete the operation and I want to show you what happens to the other two how it looks like okay so in this case there is no statement running okay the inset was computer successful its transaction is currently active but no statement is running so will not get any error message or any investigation in session one in session two about the being aborted but what we will see okay so it's done and this is done okay this is completed now the own engine is rebuild that you want at this point doesn't know anything if we try to commit the transaction we will get this error okay because we have been rolled back and same for session two it was rolled back no set no transaction the incest was rolled back nothing happened okay so this is how it works and okay we saw this let's go quickly through the demo okay so before the summary questions about online is revealed or weighted low-priority or anything else well something else what considerations for the rebuild is an always-on environment before 2016 always on the for 2016 considerations for the rebuild I'm not to understand a question what is it different regarding always-on if you want to you need to reduce your indexes you can do it just as well in your primary replicas in an always-on database of course that all the maintenance all the reveals operations will be propagated to the secondaries as well but I'm not sure what what the purpose of the question I don't see a difference in the considerations when you reveal your indexes or when you do it online or offline we're always on so sorry about missing something in this question okay the next one is does a delayed durability work with in-memory database / tables a good question I think yes I'm not going to sense your story I need to check this I think I think that we can use the reader ability within memory tables in that ok so greatly a samurai's so we talked about what's a very large table is and again my opinion this means you need to give a special care at this table as long as you do the same things because they will not consider it very large what it means to work with very large table we talked about managing the table specifically partitioning and also stretching the table to the cloud with new feature in sequel 7 2016 we talked about the ascending key problem how to improve query performance with things like filtered statistics and incremental statistics and the new credit the estimator since sequel server 2014 and some other things we talked about how to improve data loads the rate of inserts using delays your ability to improve the throughput and of course you have the risk of floating data and also how to improve the maintain of the table by rebuilding single partitions online and also by specifying that you wait it's low priority when you rebuild those partitions to not interfere with your regular workload what it means to work like a pro so again those are the how to soviets any key problem how to solve the rightlook problem with the legibility the index reveals and with low priority and the new features we talked about in sequel server 2016 our stretch database the new cut dirty HT matrix a new purchases since sequel server 2014 it has some great improvements and again in some cases it might be dangerous you have to monitor and be careful about the implications of labeling it you have incremental statistics in the partition table you have the ledger ability and we have the online indexing enhancements which just talked about since 2014 some other things you might want to explore this I didn't cover in this session so one of them is compressed data if you have a very large table and it consumes a lot of disk space and also space in memory and also i/o you can improve all of that by compressing the data potentially and this is will reduce dramatically this space acquired by the table and its indexes and also improve the amount of i/o I increase the table and there are all kinds of things you can do optimizing parallelism okay so when you work with partition table a very large table you can improve how sequel server produces power plan and how it executes them at runtime there are many things you can do here I didn't cover any of that in this presentation what I can recommend is to follow a democratic who is an expert in this field which is recorded sessions or follow his blog he writes and talks a lot about parallelism and he is a master in this view you can consider maybe using a lower isolation level so if you had a concurrency problem against this table you have many many updates then you might in some cases work with read uncommitted or the no log query locking it which might be good enough although has some implications or maybe you can use read committed snapshot and improve concurrency of course this means you will have more overhead on temp DB so you have to monitor and see if it's good for your environment and if it's the data warehouse and it's a schema sorter and you have a very large fog table using columnstore indexes can really speed up your aggregations and start schema queries so consider using that and that's it you have my contact information here you can follow my blog there is already a post on my blog which is here which summarizes the the session it already contains a link and embedded link to the recording of the previous session as soon as the recording of this session we go live I will have it the blogpost and put it there as well and you can download all the materials the presentation and the demos from that blog post as well so it's all there and thank you very much other very nice chance there's one more question what other operations beside index rebuilds can use low priority the questions about like maybe update statistics or backups so now only only two types of statements online index rebuild and it's only online you cannot use it with offline only online index rebuild and the other statement is switch partition those are the only two statements as you can use with operating who's the person that you recommended to follow for the additional topics I think he said atom mechanic yeah a democratic for the authorities and optimization it's the EM acha and IC mechanic ng recommend max dot option for index rebuild max the option for index rebuild yes absolutely absolutely so again this is part of the optimization you can do in parallelism if you run a large index rebuild you have lots of cores on your machine if you don't use max top and the the default value is to use all the codes available course then your index rebuild can consume all the CPUs on the server and it can hold the service so other sessions have to wait to run every small task so in one of the recommendations yes is to use max dock to limit the number of course that your index reveals will use and free up resources for other queries running in parallel well that's it that's it for the questions I really wanted to thank you again for your time and for a excellent a two-part topic you know session this is this was great thank you thank you
Info
Channel: PASS Database Administration Virtual Group
Views: 2,033
Rating: 5 out of 5
Keywords:
Id: wCJYejAWfW4
Channel Id: undefined
Length: 61min 48sec (3708 seconds)
Published: Wed Aug 09 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.