SQL Server Database Administration for the non DBA

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and welcome to DBA fundamentals Down Under and the June edition of this broadcast in today's session we've got Deni cherry talking to us about sequel server database administration for the non DBA I'll hand over to Deni shortly but first we just run through a little bit of housekeeping so for those of you who know about past summer it's on in October later this year and if you would like to purchase tickets and you haven't already using the codes showed here you can get a hundred and fifty dollars off your ticket now as we can see prices are going up very shortly so if you include that one hundred and fifty dollar ticket you'll actually or discount voucher you'll actually save a little bit more money based on four hundred dollars if you buy now rather than wait for the July 17 price update you'll also go in the in the drawer for a three hundred dollar Amazon voucher using this code for those of you who don't know the sessions are recorded and are usually available upon the DBA fundamentals archive site in a YouTube channel in a couple of days time after this session has finished with that you'll have the ability to re-watch this recording and previous recordings from these sessions as well as see the slides from today's session apart from this particular virtual chapter there are other virtual chapters that cover a range of topics so make sure if you're not already in those visit the past website to find out what those virtual chapters are sign up and you'll be able to keep up with all of the content from some fantastic speakers so with that I'll hand over to Denny and we can get today's session running cool thank you you should come to my slides we can't say B yes the start of your slide deck yes so thank you for coming welcome it's just supposed to be the world's easiest presentation for me is that I'm supposed to be presenting it at 8 o'clock at night in California unfortunately I am in India so it's currently 8 o'clock in the morning instead so slight change of plans for me but nobody'll it's not that early so welcome to sequel server database administration for the DBA my contacts are on the slide Sony motorist is mr. Dionisio so if any questions come up that we don't get answered today feel free to shoot me an email more than happy to answer them if you do have questions during the session go ahead and put them in the Q&A panel and work we'll go ahead and read them out during the session so just making sure so let's go ahead and get on to the actual content here um so a little bit about me real quick I'm sorry I am the owner for dinner cheering Associates consulting I'm wrong yes I'm San Diego like I said right this right now I'm in Bangalore India I've written six books including securing single server by solo bouquet written dozens of magazine articles for various websites including single server magazine Penton magazine the night's Pro TechNet magazine atmost and magazine a Microsoft MVP we were called Microsoft sequel server MVPs that will update a platform MVPs I'm also in Mexico certified master for student server 2008 and 2012 and I am also a VM or V expert as well as all that so let's go get all the stuff that we actually came here to listen about so the agenda for the next hour or so is that route back roads files file groups and discs and puzzles all work together in sequel server we're gonna talk about backups and how we should be taking our backups which we should be doing with those backups to make sure that we can always keep the databases online we talked about the recovery model and sequel server and what that recovery model is and what it's used for we're talking about database maintenance and what sorts of maintenance we should be doing on our databases to keep them healthy and keep them alive and and keep them working well remove the compression and the various compression options that are available in sequel server we're lookin corruption how that happens what that is how to solve it they had a check for and we're look at the compatibility level and see what that controls and what it doesn't control kind of spell some myths that come out and then we're going to look at indexes and see you know how to make people sort of faster how to how to use indexes to to make things quicker and better so all into the actual presentation so the the files will founder from desks so with sequel server everything is stored in local file groups these file groups are just some logical constructs that exist within sequel server and make it so that we can put a table across multiple hard drives so we just for one of a couple of reasons one of the reasons we may do this is because our database is bigger than the hard drive that we present to Windows so perfect example I've got a database I've got an indie drive on my server that's two terabytes because that's the limit of an MBR configured hard drive and my database needs to go beyond the size of that that disk so if we didn't have file groups I would need to create a new file put it on that new hard drive I'll say the F drive and then I would need to tell Seamster to move tables from one disk to another not exactly efficient so instead we have this file group can construct so within this file group we put multiple files and then sequel server just kind of puts the data in whichever one of those files has them free space now those files can be on as many hard drives as we want they can be on the same hard drive so the other reason we might put multiple hard drives on the server I want to spread a table across multiple hard drives is for better performance you can get more throughput out of multiple decks other multiple disks than came out of a single disk know that said I don't see that being done too much anymore but it's not for performance purposes because we now have this SSD storage that if you need high-performance storage you're not going to be strengthened together a ton of spinning disks you're gonna go out and get some SSD drives and just have one one set of disks for SSD drives now those disks obviously hold files again those files are in file groups and I will create a table we tell single server to put it in that specific file group so you'll see this if you just script on a table that's existing today it's gonna say on primary or to primary at the bottom of the create table statement in that simply telling sequel server that the name of the file putting that data into is called primary that's the default file group name for every database on a server now you can create multiple file groups if you want to build an isolate stuff from other stuff um so maybe you want to isolate some type some tables from me on the same physical disk as other tables or maybe you want to isolate non-clustered indexes from the base data so you can get more throughput to the disk you can do that through the server know again not something I'm seeing done all that often anymore we used to do this a lot back in the 90s in the and the 2000s but just because we've got so much high-speed storage available to us today so inexpensively this compared to what it used to cost to get those levels of third put it's not necessarily something we do all that often anymore but it is still an option that's available to us now every database you create is going to be made up of at least two database files so that first file holds the data that's your MDF you can have additional ones that are called to be called NBS and then the second the second file that you're gonna have is your transaction log file so your main data file is always gonna have a file id of one and your transaction log file is always going to have a file idea of two now these data files are very different if we need more performance we add more data files on different hard drives if we had need more performance of our transaction log having more hard drives and are adding more files is not gonna help because these files are used very very differently now the data file is used whatever space see clean single server grabs so it can grab space anywhere in that data file for new data that comes in your transaction log file on the other hand is a circuit or a log file so it's always rewritten in sequence so adding more transaction log files it's not going to help any for performance because it will simply continue to use the existing that one transaction log file until that log file is full which leads to the end of that file and then it'll simply move on to the next one now there are some very advanced things you can do that could potentially speed up transaction on throughput but those are extremely edge case and not something you're going to need to do unless you're pushing you know huge huge amounts of throughput through your transaction log oh and the lies that it's just not necessary to worry about trying to perform that performance tune that the way that most people do performance in the other transaction log faster hard drives what SSD behind it no let's talk about backups for a little bit so there are three different kinds of backups that I want to focus on right now there's some other options but the important ones that 99% of systems are having to deal with are gonna be full backups differential backups and then our transaction log backups so those three are gonna work together to get you the ability to recover your line of business databases to you know whatever point in time that you were at when that system failed now I did say there were some other options and so those other options are file backups and so those file backups are going to allow you to take a backup of just part of the database now this is typically done for things like data warehouses we've got large massive portions of the database that aren't in use normally or that aren't being modified either normally or at all so those get into a much more complex scenario especially when it comes to restoring those but for 490 like sent for 90 and emption of systems we need to focus on these three backups at the top there the foals the the differentials in the transaction logs now rule number one when it comes to dealing with backups is button make sure those backups are saved on another server I cannot tell you how many times I've walked in the clients and I go see the other backups are configured and the backups are being saved to the same hard drive as the database so okay so what good are these backups to you that and so they'll usually tell me that the backups are there so that if somebody deletes data or if they lose data that they can restore the backups okay what happens if the hard drive fails if now lost the server database and the backups so rule number one take backups rule number two get those backups on to another server or another device if you're running Co Server 2012 sp2 I believe or higher you can backup to Azure very very easily using just the backup to is your feature let's go back up to the user blobstore if you're writing an older version of sequel you can still actually do backup to assure you just need to use the new file share a feature that's available in these your mob source where you can actually present a network share directly 200 inner blob store and you can backup to that directly just using a UNC path it's pretty cool to me although do that too do that simply so the other option is that you know is available to you at that point so getting backups out of your data center not that hard at this point in fact one of our customers we don't typically backup - - is your blob store I'm willing to do if we need to send data to Microsoft to send back some tobacco for the database to look at something where we do do backup to Azure and it's actually faster than backward pull of our drive so you know that's something to keep in mind just it may be quite fast to back into Azure just depends on what your network man bandwidth this little look like and a band without grade might be less expensive then you know trying to deal with paying for local disks to backup all your databases - keeping keeping something they're fast enough so those full backups these are a point in time copy of the database so doesn't matter what recovery model you're in we're gonna talk more about our model in a few slides so this is a point in time copy the database and it's a copy of the database in the state that the database was at the end of the backup so new transactions that are in flight during that backup operation while that backup is being taken those are going to complete successfully and those are going to be included with that database backup so this is one of the reasons why you might see your transaction lock found below while your backups are running then that's because all those transactions have to stay in that transaction log file during the process of that full backup once that full backup is done and once it's done God and all the base data it will then copy all the transactions out of the transaction log and into the full backup so they're loaded with that full backup the full backup should be done preferably typically either daily or weekly it just depending on how many files you want to restore if this is a line of business system where you need to restore it through an exact second or if this is a paid that didn't work or house where you're just doing one differential backup on top of the full every you know every day I was case you know products more sensitive differentials for line of business systems for my customers I have them backing up daily and then doing transaction log backups throughout the day that reduces the number of files that we need to restore and it improves it reduces the amount of time that's going to take us to do that restore and get that system back up and running in the event of a failure so don't forget we're never dealing with or configuring your backups the whole goal here after you've taken the backups is to restore the database in the event of a failure so we need to keep that in mind and make sure that we've got the database that these backups in such a way that we can restore the system as fast as possible you need to get that system back up and running with as little impact to the production environment as possible so we're into her about the other tens of backups in the second year but elephant recovery models first so there are three recovery models available to us in Siebel server those are simple bulk logged and full so those recovery models are used to tell sequel server how much logging it's going to keep and in some cases how much logging it's going to do now by default no matter what we'll cover model here in most commands that are executed against the database are fully logged so your inserts your updates your deletes those are all going to be fully logged no matter what and no cover model you're in every single time you run them now there are some commands which may or may not be fully logged it just depends what recovery model you're right so if you're in the full recovery model every single command you execute is going to be fully along the reason for that is we need to be able to have those commands in the transaction log because they need to go to the transaction log backup so we can rebuild the database to the exact state the database was in at whatever time we've decided to stop the restore either a point in time or if we just go to the end of the file and get the system responsible if we're in the bulk log recovery model we still have our pointing and recovery we just don't have a full logging of all commands so anything that's considered a bulk logged command in the long-term remodel will have what we call minimal logging done so minimal logging simply means that the transaction log logs the fact that the block was allocated or the page was allocated to an object but it doesn't actually record that the data was changed on that page so it's a much smaller amount of log so we can still do point in time recovery one or the bulk along to cover model to a point so basically the rule is if you're in the bulk log recovery model if you need if you're doing a bulk operation you cannot do a point in time restore to the middle of that operation so if your book logging data all day long into the database using PCP or you can SSIS with the the fastest insert option set then you won't build to a point in time restore to any point where one of those operations are running you have to be at the beginning of that operation at the end of that operation so that you know that could be problematic depending on how often you're doing those bulk log operations you can still do transaction log backups when you're using the book log recovery model any bulk operations that happened during that period of that transaction log backup or you know during that period since the previous transaction rollback have happened obviously you can't just back up the transaction because we need to get the data as well so it will actually backup those pieces of the database that were both logged during that time period so that can make the transaction backups quite a bit bigger than you one would expect given that we're not actually logging all this data and that's because the actual data has is being back to as part of the transaction log now when it comes to restoring you don't need to worry about that you just tell restoring the transaction log file and it will restore all those blocks as part of that restore you know that anything special to it now the simple recovery model does not it's just like the book log so we're still doing full logging if insert update delete we're doing minimal logging for all the minimally logged operations on all corporations the difference here between simple and the other two is that with the other two we run the process this process called checkpoint which is going to tell this Eagle server to write all the data pages to disk that have been modified and it's going to log that it did that in the transaction log and it's going to wait for the next transaction backup to come through and do backup the actual data from the transaction log with simple recovery model when that checkpoint process runs and it's going to do the exact same thing it's going to write all the dirty pages to disk it's going to marking the transaction log the check but will run now the difference here is that it's going to also at that point mark all the transactions in the transaction log as being fully committed and it's going to allow the sequel server to now overwrite them so this is how we get simple recovery model is we don't get that one time recovery because we're not keeping the transaction logs waiting for a transaction log backup so for line of business systems probably not the best idea because we need that point in time recovery for those line of business systems for data warehouses or systems where we can rebuild the system off of the production data very easily those will typically be put in simple recovery model so we don't have to worry about doing point in time resource so that's the big difference between those three operations so by default key thing to remember here is operations are always logged so no matter what we're doing we're always logging the operations so there is no way to turn off the transaction logging in sequel server the only exception to that is our ability to use non-durable tables and staples over 2014 and above when we're using in-memory OLTP or project pakatok that those non-durable tables turn off transaction logging because the data is not durable and we know the data can be lost so we don't care about so we talk with that if we're gonna have slaw I'd show all this yeah we talked about that yeah we talked about this so let's talk about restoring data so we restore data we always start with a full restore step number one of any restore process is restore the full back so you need a full backup to start on when's that full backup restore if you're have a differential you can restore that differential if not you can do a trend or you can do a transaction on restore skipping differential once you've also got a page level restore that you can use or a foundry restore if you're doing founder backups so what you're going to restore isn't depend on us then what your what you're taking so like I said first step is always gonna be that full restore you have to have a full of database backup you'll then have either differentials or transaction logs that you're going to restore on top of that now the reason we want to do a full database backup every once in a while at least every day at least daily or weekly preferably is that we need to start with that full and then restore all the logs on top of that so if we only take a monthly full and then we take transactional backups throughout the day every day if the database fails at the end of the month we now have to restore thirty days worth of transaction logs to roll all that forward so suddenly this can take a very very long amount of time to do these resource because we're now restoring tens of thousands of files potentially depending on how often we take our full our transaction log backup so this is why we need to do that full restore every day or every week at the at least so there are stories out there of companies that have taken a full backup and then never take another one and that's do transaction log backups throughout the day every day for years and then suddenly they need to restore and it's going to take them surely months to all all the transactions forward to get to the point of actually bringing system online so obviously that's a bad situation to be in that's a we just lent out a business sort of situation so you know be very careful about how these things are configured so you can make sure that you can get your get your databases back up and running in a reasonable amount of time database maintenance so our databases do require a little bit of maintenance to keep them up and running so they do need that a little bit of tender loving care to to keep running in optimal performance so that we do that through one of a couple of different operations we can do that through an index rebuilds or an index defrag so in either case what we're doing here is we're simply telling the sequel server to take all the all the data in the index Andry sort it so that it is physically stored on disk in the most optimal manner possible so it doesn't matter which one of these operations you do and unless the index rebuilt or next defrag the end result is going to be the same so the only thing that's different thing needs to do between the two of these is if you do an index defrag you also need to do in update statistics after that index defrag is done once you do that update statistics the output that comes out of the defrag and update statistic operation is identical to what would come out of an index rebuild operation so you don't need to do both of those why we have two operations you might be asking is because the amount of time it takes to perform these operations is going to vary depending on how far i'm nted your indexes are so the next fragmentation is simply how spread out across the entire disk the data is the more fragmented is the longer it's going to take to rebuild so the general rule of thumb if the index is more than 30% fragmented a rebuild will probably be faster if it is less than 30% fragmented a defrag operation is probably going to be fast so those are the tool that's kind of a rule of thumb with with which operations pack so you can write your own code that will look at the fragmentation through the dynamic management views and tell you which indexes are more than 30 percent which ones are less than 30 percent fragmented if you want to write your own code you can certainly do that there's a lot of documentation out there as to what you need to do to make sure that code is going to work as expected or you take the easy way out and go get one of the various free or paid for management tools or defrag tools to help you with this the whole Hana grin has some really good scripts out there to help take care of this you need those for all about handgun comm or you can go grab some like minion backup which also does a really good job of seeing what's you next to fragmented and which ones aren't and defragmenting those in the appropriate manner we will listen the appropriate manner based on how friend fragmented your indexes are so so winning index rebuilds I don't believe is free if they pay for that module my boo that's free not completely positive I can't remember on top of that so there are some other little differences between those two operations trying to rebuild and addy fry so in industry builds can be an online or offline operation that's going to depend on version of sequel server you're running and what data types you're using on that version so it was originally always an offline operation back in the sequel 2000 days in 2005 they introduced the online index operations or online index rebuilds in the Enterprise Edition of sequel server that is still an enterprise edition feature to this day and sequel server 2016 what this is going to do effectively is create a new index and then drop the old one and put the new one in place because it's effectively a new index that means that the statistics are going to be fully updated because as imported everything I've got the updated system now a defrag operation is always going to be online here we're only moving the rows that we need to move and the pages we need to move to get this index clean so it typically is going to you know a much smaller amount of work if the index is not very fragmented like I had mentioned earlier so that's kind of the two differences between those knobs the other big difference is the index defrag is going to work and without updating these statistics we're a theme that rebuild will update the statistics so those are kind of your two options for doing those and next main that's now I've been to this thing called statistics a few times so you may be wondering what that is so statistics are how sequel server figures out how to get access to that data how to go find the data that look before so every time you build run a query you get what's called an execution plan you've probably heard of these before execution plans are literally the data map of how sequel server is going to what operates didn't see what server is going to perform to get the data you're gonna look for is they're gonna do an index seek or an index can then do a table scan what kind of joint operation isn't gonna use a nested loop or a hash joint so this is all the year now and stored in that execution plan so the we service the symbols are figured out what to put in that execution plan is based on the statistics on the table now these statistics are a sampling of the values within a table or within the index these statistics contain up to 200 samples of the values within the column that the statistic is created on now this may be somewhere weird so don't worry we're looking on a second the statistic also tracks the estimated number of rows between those values so let's look at statistical fair one second from your surface I keep three pens in my surface for this exact reason so there's no after back so here we have the name of the statistic so here we have the date news last updated so obviously this is done a while you don't worry about that we have the number of rows in the table with number of rows that we sampled when creating the statistics so in this case you'll notice that those are the same true values the reason those are the same three values is because this table is fairly small if this was a larger table then the number of rows would be much larger than the number for example the statistic has 200 steps it is not built on a string and it is not filtered if it was filtered it would say what the expression there was that was filtering it on the number of rows that were in the table tool so what is this built on so this is built on the product ID column of the sales order detail table so this is built on an index called IX underscore sales order detail underscore product maybe that index contains three columns which we can see right I love having a surface and a pen makes life so much easier so the index itself is booked on productivity sales order ID and sales order detail ID now the statistic is only built on productivity the reason for that is we only have what we call single column statistics so we had multi column statistics the statistics would be much much larger and had much more information and therefore takes it will suffer much longer to work through it when it's trying to figure out the indexes and how building execution plan these statistics are all built around the fact the sequel server needs to be able to process this data very very fast in order to get the execution plan built quickly so we can go see what's going on so if we look at this statistic here we can see there's this column called range high key so this range high key column this is the product baby so this is the actual value from the table so the rest of the information here in the bottom part of the screen is just telling us about the rows in the table that were relate to that range Heike so the interesting number here is the EQ rows this tells us how many rows are equal to these values so if I go look at the table I should find and I look for products of these 718 in this table I should find about 219 rows that have that value well I'll probably find exactly that many because this is a sample database it's from venture works no you can see here that statistics aren't perfect because it's it's telling us there are 218 values in the range between 718 and 719 now we know though this is an integer did in fact because I've seen the table port that's how I know that so I know that there is nothing between 200 and or 718 and 790 so I know that that value is not going to be yeah that particular piece of this statistic is wrong but sequel server doesn't know that so it's gonna you know help for that or try to account for that when it's building whatever execution plan you know it's gonna give us so this is this is all the information sequel has when it builds those execution plans so this is why occasionally those execution plans might be wrong and what we might see performance problems from those plans because we're simply looking at these statistics when we're building those plans not the actual data in the table so you might be asking why we don't use the actual data in the table when building execution plans and the reason for that is because we don't want it to take too long to clean that data or to get those plans if we waited for it to query the entire table it could take a long time to build an execution play kind of query all that dated figure what's going on especially if there's no index on that table now obviously this table only a hundred thousand rows it wouldn't take that long to do that but say we haven't able with a couple of billion rows in it now suddenly it's gonna take forever just to move the execution plan because we don't wait for sequel to query all that information so to solve that problem we have these statistics which have this sampling of data now unfortunately you can only have those 200 rows in the excitation plan number of rows right here so because we can only have those 200 rows these can get kinda iffy on those really really big tables but we have to just work with that because we don't have any other options available to us now let's talk about compression so compression is how we can get free space in our database so the compression save space inside the database it saves a space within memory and it can save a space within our backups as well so consider space in a variety of places so it saves space within the database because the data is compressed so it's stored compressed on the actual hard drive if we're storing the data in a compressed format on the hard drive that means that when it gets loaded into the buffer pool it's also compressed we do not compress the data with a buffer all the data stored in the buffer pool is identical to the way it start on the hard drive and then if we are compressing our data obviously a little backup that data that backup data backup get it is compressed as well now compression cost us CPU but it's usually worth it based on how much savings we're gonna get now doing actual data compression within the data file we have two options we have what we call row compression and what we call page compression so room compression is very horribly named and here basically what we're doing is we're just changing the way we store numbers so they take the minimal amount of space possible now if I create a table and I put a column on it with numeric 16 comma to that number is always going to take eight nights guaranteed now if I turn on compression the amount of space taken up by whatever number I put into that column will take up the amount of space needed by that value so if I put the number two into that value and into that column without Boro compression it takes a big bite we threw a compression it's except one bite because I can store the number two in a single byte so this is where real compression comes in handy it does nothing for text values it only works on numeric values now page compression on the other hand works on text strings so you can compress those text values so that you can get the best performance out of them and the most space savings now how much space Simmons you're gonna get is gonna be solely determined by how much throughput or how much what kind of data you have so the you know if you've got large amounts of duplicate data on the page giving an irreligious compression if you would have very distinct data you're not gonna give us great compression typically actually somewhere between 30 and 60 percent compression of a table when we compress them there is a stored procedure in sequel server that will let you see how much compression you have it's called like SPMS or estimate in next compression cost savings or something ridiculous like that it's literally the longest stored procedure name there is in sequel of the system system objects so if you just go find the longest object name you'll find the right the right object now data compression done on the row of the page that is a per index decision that you're going to be making so you pick per index how you want to progress that data so if you've had a table with a lot of text fields you might want to compress the clustered index using page progression but if you've got a lot of call if you've got indexes that are only built on numeric columns then you might want to use real compression on those indexes because there's no point in wasting the CPU power of page compression on an index that doesn't have any text it's up something to keep in mind backup compression is an honor and off if you're doing row or page compression back of the compression may or may not be helpful you'll need to look and see now the good news backup compression was introduced as an enterprise edition feature in 2008 it was converted into a standard edition feature in 2008 r2 data compression on the other hand is a row or page isn't enterprise only feature so you will be need you will need to use the Enterprise Edition a sequel server for that however that can definitely be worth it now data compression both Potro and page is available in your sequel DB in all editions of sequel DB so even if you're running the basic Edition for your database for dev and test you can still use this Enterprise Edition feature which is kind of cool if you're in sequel DB back and professionally don't care about instead believe you because you're not worrying about the backups now data corruption it's a data corruption obviously a big deal all databases can become corrupt and many of them will at some point in the future what causes that corruption almost every single time is a hardware problem it is almost never about game sequel server we should be checking for our corruption regularly and by regular regularly I mean as often as we can get away with running check DB to check for corruption now how do I know that corruption is almost always a hardware problem and that's because sepals are market they're single serving product team has done some analysis and which they don't technically relates to the public but I can talk about because I heard them talk about it when they've looked at all the corruption that they've seen in the single server databases over the years and reported to Microsoft's customer support team 99.99% of the corruption problems they see are starch it's a some sort of problem with the storage 0.005% problems are bad RAM and 0.05% problems are a bug in sequel server now this is excluded sequel server 6.5 because that thing would corrupt your database if you sneeze dear so it is very very rare that an actual bug in sequel server would cause data corruption those bugs do creep up during the development phase I'm sorry we're running a beta version of sequel you know yeah good luck and there were some problems in sequel 2016 during the development cycle that got fixed but you know as production bugs go I've only heard of one knock on wood since single server 2000 I believe that's actually made its way into production um that was in Mexico Server 2012 in a specific service pack and Cu and it involved doing index rebuilds with certain kind of indexes online in parallel parallelism and there were just a bunch of requirements that you had to hit so very few people actually did it now depending on what is correct within your database is going to determine if we can fix the problem with or without data loss so general you know though the basically the way this works is if a non-clustered index is what's corrupt we can fix that without losing data because we can simply drop that non-clustered index and rebuild it and it will rebuild it from the data stored in the base table and go away if the data is from your clustered index however that is your actual table and therefore we're not going to be able to rebuild that without losing data so we are going to need to lose data to do that so really it's a restore whatever missing from a from a database backup at that point we check for corruption and we repair that corruption using GBC C check DB so when we run check DB it's gonna tell us the repair level that is necessary to fix the corruption within our database so just again depends on how corrupt that database is as to whether our builder parent at the bottom it will once it gives you the output if there is corruption it will tell you what the flags are that you need to use to repair that database if it says repair allow data loss it is going to lose data so that tells you that odds are your you've got a table that is corrupt depending on what's corrupt it may or may not be fixable you know without losing data it just depends on what pages are corrupt and how much corruption there is also depends on how good your backups are if your backups are really good you've got full backups you've got log backups original it's happened you know we can rebuild that data relatively easily corruption is going to happen it's just a matter of how you prepare for it to see how easily you can survive it how long you can survive it so I'm actually in the middle of deloo the database corruption problem for one of my clients right now they have a lot of corruption there were almost a thousand pages that were corrupt within their database there's a SAN mix and problem they ended up having so they were in the middle of replacing the Sam anyway so that made life easier as far as explain was going on but yeah I mean they had literally a thousand pages go go suspect across a dozen tables including the some system some system objects so if your system objects become corrupt now things are a whole lot worse because now there is no way to repair the database and your only option at that point is to do resource so that can get you into some very bad problems very very fast if you're doing you know if you if you've got corrupt system objects the odds of system object becoming corrupt by the exact same as the odds of any other object Emma craft it's just a matter of how many pages you are you have and what page is being written to when the story hiccups and screwed up the data so it's no it's just a matter of chance as to whether or not it's going to be a user object you know table you've created or a system table so like I said some of these things can be repaired sometime you're just gonna have to restore from backup it's all about preparing for this corruption and so that way you can recover from it as quickly as possible I've had to deal with it about three or four times in my career not probably it hasn't been that often some people have to deal with it a lot more often just it depends on how good the hardware is that you're working on this parents going to be the determining factor on how often that corruption is going to happen so let's talk about happier things than databases becoming suspect and going wrong so compatibility levels so the compatibility level in sequel server is a database level setting and it tells the sequel server what version of the T sequel language to support within that database it does not repeat not affect the version of sequel server that the database can be restored to after you backed it up so if I have a sequel server 2012 server and I've got a database in sequel server 2008 compatibility mode I cannot back to that database up and restore it to a sequel server 2008 server I can only restore it to a single second sequel 2012 or higher server so you know how to be little compatibility level does not change where we can back that database up to restore that database check the reason for that is different versions of sequel server use different physical database file formats every major version since sequel 2000 the database file format has changed as they've revved the product so because the database file version is changing the older versions of sequel server don't know how to read that data file so they can't touch it that's why you cannot take a 2012 database and move it back to a 2008 r2 system siebel server typically supports several downlevel compatibility levels so if you're running Cygnus over 2012 it's gonna go back to the slides actually there we go so running single server 2014 it's in the support vectors 2008 and everything in between seumas / 2012 supports from sequel server 2005 which is group to 2012 and sequel server 2008 our to support from sequel 65 and up single server 2016 I believe still supports 2008 and up I would have to double check that and update my slide accordingly so just keep in mind you know you not gonna be able to go back as all the way it is definitely going to change how sequel server handles that T sequel depending on what version you're using so a perfect use case our perfect example of this is flipping between the 2008 and the 2012 compatibility levels so if I take a database and I put it in the 2008 compatibility level and I try to execute a store procedure and use a function as an input value into that store procedure it will fail if I simply change the compatibility level from 2008 to 2012 it will succeed because in 2012 that was supported in 2008 compatibility level signal 2008 Nadar - it was not so that's the kinds of things that are gonna change between the versions between the compatibility bubbles now it doesn't necessarily mean that everything's gonna stop working it just depends on if that's a database scoped operation in front of perform or a server script operation that you're gonna perform so perfect example of this is I can take a database that's in the 2008 compatibility level and I can put it into a and availability group obviously availability groups did not exist in 2008 r2 or 2008 but because the availability was a server scope object so alter availability is a server script operation I can put any database into the availability no matter of the compatibility level because I'm making a change to a server script object not a database scope object so because of that the ability level does not apply within that database so let's talk quite a nexus indexes are are go faster button if you've ever been looking in single server and you're playing with all the settings and you're trying to find that go faster button to make the make things go faster indexes are it not unless your indexes are going to be how we make the database go faster so non-clustered indexes are well honey index in general really are sorted based on the columns that you've specified within the index so if I take a I'm clustered index and I created on last name then first name then hire date the index will be sorted based on last name then first name entire day so save an employee table so it's got employee ID first and last entire day department ID etc and I want to create an index on last name first name basically with single servers going to do to create that indexes run the query select last name first name and from employee order by last name first name the output of that query is gonna be how we built our next now indexes do caused do believe and it to be stored on the hard drive that's the whole point or chain we're changing space for speed were just cheap buying this expensive so I want to store in my indexes so that my queries run faster so that my employees have spent and my customers depend on who's using the system spend less time waiting for the database to person to respond when you first start playing with a mixin you're gonna notice that's it's half art half science it's gonna take you a while to get the hang of things but once you get the hang of things it is very easy and you'll it'll become second nature and become very easy for you to figure out how to get this all happen so we've got a few different kinds of indexes that are available to us we've got clustered indexes non-clustered indexes full text index is spatial indexes columnstore indexes XML indexes that semantic search so which one of these you should be picking odds are it'll be a non-clustered index everything else you don't really need to worry about too much the clustered index is your base table most of the time the non-clustered index is going to be how we do this performance tuning that's art go faster button in almost all cases if you're working that there warehouse you wanna hit columns more especially if you're working a sequel 2014 and above she's got columns with xml data that you want to be able to search in those xml documents using xquery yes a little bit of the xml indexes and then if you need to do full-text searching we're kind of with Google or Bing style searching of the text in your database and then the full text index is going to look at so how do indexes work or how should we be building these indexes so here we have a query that we need to index select first name last name from employee or last time it was cherry so the index that we're gonna create on that is going to be look just like this create index give it a name on the table or billion on open paren whatever columns we need to search by so this case we're only searching by last name and then we want to include the first name column so the way we decide what goes where is anything in the where clause because what we call key columns anything that's in the Select statement it's being returned that's not in the where clause goes in the include columns this is how we put things there now how do we know what order we want to put things in in the in the key columns we put them in the order of selectivity so this gives us on one column so it is selective but if we had multiple columns of our where clause we would need to look at the data and see which of those columns is the most selective and that'll typically in my nine point nine percent of cases be the order you know the first column that we want to put in the index and then as columns become more selective once we've done that management filtering you know then we just keep putting them in in that order of selectivity the order of the columns in the include statement is irrelevant you can put those in whatever order you want put them in we can use filtered indexes as well so filtered indexes are kind of cool because they allow us to remove rows from the non-clustered indexes so this can make I'm not sure index is smaller and make them perform better so here we've got a similar statement select first name last name from employee or lastname equals cherry and active equals one so if I want to create an index on there I would say same index I had before but we'll notice down here at the bottom I have a where clause inside my create index statement that where Clause is removing any column and any rows from Mike long clustered index where that have an active line that is not equal to one so only the column the rows that are where active because one are included in my index so this will be very handy especially on very very large tables to get only the data that we really need in that index so should I be using those all the time no definitely not but there are definitely situations where I want to use this because it does make the most sense so indexes are not free they do require space to store them they also technically slow down or insert update delete operations now they don't slow them down much we're talking nanoseconds typically me it'd be a millisecond but they're gonna make our select statements much much faster I've spent I don't know how much time arguing with people arguing with developers that they don't want to slow down their insert and delete operations but they're willing to accept 45 second select statements just because they what that extra millisecond on inserting a new row into the table that's not a good argument just really it if you've got long run inquiries in your app where your users are complaining that it's slow doing an insert of 45 seconds you don't care about the extra millisecond it's gonna take to insert data into the table or to changing the table you need to solve the problem of the 45 second query so I was working with one client once where there's really cool dashboards but I was taking like five minutes load because of all these longer and inquiries that were behind it so the developers spent all this time building all this asynchronous processing those processes into it so that the dashboard would load empty and then as all the database queries on the back end would load you know it would slowly populate everything and eventually you get all the data you are looking for basically the goal here was the user didn't want to wait for meeeeee with a spinning icon they were on the page to pop and then they were final in the backfill so when we created a couple of indexes on the tables and then when the dashboard loaded it no longer had to wait for the asynchronous stuff by the time the dashboard loaded all the queries had returned so you should have seen the look on his face when he realized that he he waits in like a week trimming all those calls asynchronous instead of spending five minutes just build indexes he was not all that impressed with himself at that moment so indexes are gonna make things faster and it's going to be worth it no that said we only want to create the indexes that will need having unused indexes just laying around is gonna cost us space and it's going to cost us time and there's going to be no benefit to it so if there's if the index is gonna be used created if the index is not any have used don't creat orphan indexes there and not being used anymore because it's been replaced by a different index then go ahead and get rid of it The Onion's the Nexus can be either disabled or deleted I typically recommend disabling them for a period of time that way if somebody complains a week later that's something to slow that was fast you can just go tour me and Nexus back on and they'll start we in hand again without having to look for the definition of the index there's the cool thing about the simple union index is that it's still stored there you just can't use it you spent the schema of it is store not the actual data behind it so I've got some more reading for you for one to do if you go to that URL there D see ACDelco slush re s slash I've got a database there is plenty more reading that you can do on some of these topics I've talked about as well some topics I did not talk about I'm only cover stuff in a roughly hour presentation which are unfortunately our time is over but so you can get a better idea of kind of some stuff that's available to you and you know if you need help feel free to shoot me an email I'm more than happy to answer questions for people if questions come up I didn't get something answered today if we didn't get you know the information today or shoot me an email I'm more than have an answer so answer questions for people you know so feel free so that I'll go ahead and pass it back to our host work and thank you very much for coming thanks to any thanks for the session now we do have a couple of questions come through so first one for you do we need to run dbcc check DB on a daily basis do we need to run track to be on a daily basis it depends on how bad it would be if you didn't catch the corruption quickly so I try to uncheck DB as often as I run a full backup because you need if you don't run check TB every day but you do full backups every day you're probably only keeping two or three days worth of full backups and if we're only running check TB every week it could be three or four days before we find out there's a problem and we may not have uncorrupted backups so if you're doing daily backups I would recommend running check DB daily if you're running weekly backups I would recommend running at least weekly I would prefer daily it just depends on how long it takes you to run check DB and how long you've got to you know to run it every night so if you basically general rule-of-thumb run it as often as you can get away with this you can't do it every day then do it every couple of days or every week but run it as often as you can okay so the next one if we're running a backup and at the same time we've got some insert operations happening with will the insert operations be fully completing the full backup at the end of the backup being taken so it's yeah so if you do an insert in the middle of the backup operation it should be it should be within that flow back to when you restore it because that operation will complete during that backup operation so it should be there in that full backup when you yeah so if it if it hasn't completed then it won't be in that full doesn't and you'll need it a look back up yes yeah if you start the backup and then start your it and then start with a long-running insert and the backup completes if you restore that fault your inserts your new rows will not either because the transaction has not completed so it will roll back when that restore cross when that resource is completed okay so would it be okay to restore a database and run a dbcc on the restored database would it be if you to offload the check to be off to another to another system yes you can do you can offload most of the Czech TV process to another server either just by restoring it or doing it with an availability or secondary or something like that on the physical on the production server you will still to run part of the check DB process so you would need to run DVCC check DB with physical only on the production server everything else you can do on the secondary but that with physical only assembly and check the physical infrastructure the physical disk underneath it and make sure that that is ok we can do all the logical data checks off on a second server and realistically those logical data checks that is the most expensive part of the whole process so we can simply do that up on another server and it'll be fine ok so short answer ok say how can we determine how much temp TB space we need when running a tech TV I've run with an estimate only but still have run out of temp TB space and there's no real good way to estimate that because you need to know how much data change you're gonna have and how big your objects are and there's all sorts of Voodoo that's involved with that so there's no real good way for you to figure out how much temp TB space you're gonna need I mean if you're running out I would say just keep adding more make your temp temp TB drive bigger but there's there's really no good way to estimate how much space you're gonna need because there's just too many moving parts to be able to make a good estimate of how much efficient ok now question around back up - as you are now it looks like sequel server 2012 sp1 see you to introduce the functionality for backup - ish or do you know if there's any torque or possibility that that may be back ported back to 2012 RTM I would not expect it to be only till need to upgrade to that service pack level ok so with that if there's no further questions Thank You Denny for your time you just had two quick ones ah so you mentioned a URL I'm assuming that this question is talking about a back up to URL he was probably talking about that yeah I'm assuming yeah cuz it's saying for as your backup or net mapping mapping a drive letter for a backup of his s of the blobstore yes I mean you can do you can get back you know back to your Ellis is a technical name for backup to assure or you can just do lose melt a to met due back directly to a network share in the azure blob store using the new file the new file share feature which came out in blobstore about three or four months ago okay so with that Thank You Denny for your time now just to confirm that the session has been recorded that will be placed up onto the DBA fundamentals website in a couple of days time it will be on the YouTube channel for you to rewatch this session Denny are we going to be able to place your slide deck on the DBA fundamentals website as well yeah I can send you a PDF of the slides a little excellent so for those that would like a copy of those they will be made available at the same time in PDF version so thank you for your time today I look forward to seeing you all same time next month where we will be hearing from Tim Rodney talking about I think it's sequel databases as your sequel databases that session is available for your registration and I look forward to seeing you all then thank you for coming along
Info
Channel: DBAFundamentals
Views: 8,662
Rating: undefined out of 5
Keywords: sql, database administration, dba, 101, basics, best practices, denny cherry, dbafundamentals, steve cantrell, backup, restore, create database, index, index maintenance
Id: FFN6KOKZY8s
Channel Id: undefined
Length: 69min 50sec (4190 seconds)
Published: Sun Jun 26 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.