Azure SQL Virtual Machines Reimagined: Sizing (Ep.1) | Data Exposed Live

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hi i'm anna hoffman and welcome to this episode of data exposed live we are super excited to have you joining us today from wherever you're streaming but we encourage you to join us on learn tv and engage with us from wherever it is that you're streaming from uh today we have a very exciting special episode planned around virtual machine sizing and when it comes to this topic we thought who better to bring on than the experts for sql server and the experts for virtual machines so without further ado i want to bring on our two speakers the first speaker we have today is davidplus david can you tell us a little bit about what you do hello my name is david pless i'm a sql program manager i focus on azure sql virtual machine in particular sizing efforts and storage optimization awesome thanks david and we also have pam hi pam can you introduce yourself and tell us a little bit more about you sure hi everybody my name is pamela hood and i'm also a program manager on the sql server team i'm the lead t uh sorry lead pm for uh sql server and azure virtual machines and i also handle a few of the database engine features like uh storage engine um backup and temp tv and things like that cool awesome so we have some highly technical folks not only are you all working on the sql vm team but you've also been working on sql server for a very long time probably combined many years of sql server experience more than i want to admit to that's true all right well we're really excited to have you both on the show today and you know this topic is kind of a big topic and sometimes can be a little bit overwhelming so i'd love to hear like when people think about sql server on-prem versus a sql server running an azure virtual machine what's kind of like the big differences if there are any when it comes to managing the performance for that sql server well so a lot of what you do on prem is going to be the same as what you do in the cloud right so you're you're attacking the virtual machines performance scenarios in a very similar way like you're still going to want uh to use those core features like column store indexing query data store you're going to have your same tuning methodologies the same methods that you're looking for for measuring performance and going after high resource scenarios a lot of those things are the same what's what's different is being part of the azure fabric and taking advantage of the features in the cloud and then kind of grasping the sizing and the storage differences and what you should be looking for and addressing to get the best performance and scalability for your deployment in the cloud the other thing too that i would i would say um is we're so used to over purchasing on drum we're so used to you know that whole scenario that if i don't use my budget i'm gonna lose it so we get these massive machines like you know talking to pam about this the other day you know when someone opens like page life expectancy and they show some number i don't even know what that number is it's so huge um and they show me like their percentage of processor time and that thing the thing never goes above 10 right because they've got so much machine that that it's not even coming close to touching um any kind of uh high marks even even in a peak workload scenario so that's not the approach you're going to take taking the cloud this is the whole difference between capex versus opex you're looking for a right sized machine and a lot of what we're going to be talking about today is to help you do that yeah yeah i i want to call out one more thing that's a very big difference we see in customers moving from on-prem into azure vm and that's the storage aspect of of your system most customers on prem have purchased like a large uh storage array sand sand device and they have many different servers connected to it and that thing is you know high powered and it can handle as much io throughput as you throw at it in most cases and so customers are really not thinking about how much io they're consuming they may know how many cores are in the machine and they know how much ram is in the machine and they probably have an idea of you know what their cpu percentage is over time and they have an idea of what um you know how much of that memory that they're using and the sql server counters and all that but they may not be paying attention to iops and throughput like how much io are you really driving and that is a big challenge when you come up into azure because again as david was saying it's it's you're not you don't have this giant unlimited sand you you are buying discs that you're attaching to the vm and they have performance characteristics and the vm itself has performance characteristics that you have to pay attention to and it's very different than what you're used to on prem so a lot of what we're going to talk about today focuses around i o sizing yeah that's a really good point because i mean even even working with cloud solution architects and customers when we get an escalation sometimes we'll ask like so what's your iap requirement for this particular app what's your throughput requirement for this scenario these are even apps that are shrink wrap apps these are apps that they've purchased off the shelf they have a they have a uh they have a support agreement with and it's not necessarily provided by by the vendor so these are some things that you're going to need to capture yourself in order to prepare for that proper migration so you can target the machine that you really need gotcha okay so it seems like there are quite a few things to consider here we're going to take it step by step and walk through how to do this properly yeah absolutely awesome well we're looking forward to it yeah so so what i'm going to jump into is like the the performance guidance for sql server and there there's a huge list here you we have entire workshops many workshops that covers just part of what's on the slide so you know just kind of keeping in mind again that it's it's still sql server the things that you do like setting lock pages in memory uh setting your mac server memory following the best practices for like instant file initialization and using those features like i talked about like you know query data store com store indexing all of those key things along with the scripts and the maintenance and the approaches that you take that all still is going to apply the key thing here and this kind of really goes into the point that pam was bringing up is i iops testing is critical you have to have that so you're that is the aspect of right sizing your approach in the cloud and this is what it's almost like getting as close to the sun as you can without without a performance issue being visible or or or obvious to the end user experience that's what we want to avoid right and then we'll get into like what what you know terms the terms that we use like throttling or um or or you know we talk about governance and things like that this is all a matter of reserving those resources as part of the vm or as part of the sql experience but making sure that it's not something that the user is having to contend with we want the user experience to always be solid okay awesome so this is really useful and i think that's probably where you're headed next but when i go and i go to deploy a vm there are so many different choices so i love y'all's input and i'm sure our viewers would appreciate kind of understanding exactly this like how do we size and scale accordingly for our workloads yeah that's this is probably the most important topic now we want you to have that iops and throughput already figured out because you're gonna you're gonna need to have an idea of what you're targeting what your needs are um there's all sorts of different virtual machines out there for all kinds of different work workloads there's dev tests there's burstable um small scale large scale there's purpose machines like for like sap you'll even see in the description for large database systems um gpu intensive and so forth the other thing too is the portfolio is constantly growing like i've updated this slide several times recently just because the new machine will pop in but for our purposes of course as part of this session we're focused on sql server and data warehouse scenarios like the 4 8 16 32 and higher core counts we're looking for really strong mem decor ratios strong throughput with the right mix of performance features now i think what is there there's eight there's eight groupings types and then we have series okay there's about 380 different virtual machines so if you were to go out and you know go look at the pricing calculator or just go look at a list of sizes and you're just looking at the vm counts you're you know where do i start we're gonna make this really easy okay first of all i'm going to tell you to target the memory optimized for your core enterprise scenarios reason our reason for this is you have a large amounts of memory and cpu high mem decor ratios for strong relational database scenarios one of the key things is premium disk and cache support that's very important concept that we'll get to and right acceleration that's in particular strong for the m series now in this grouping there's two new uh uh series that are in preview this the ms v2 in the mds v2 it's only on cascade lake processors it's only gen 2 but it's designed to be kind of a mid tier in this enterprise range for these vms they have both disk and diskless options for these vms these are perfect for like enterprise level sql server workloads okay um the next one that i i'll mention is the general purpose and confidential computing um think about like departmental workloads like think about compared to the m series maybe lower user counts smaller purpose shrink wrap applications um it's good for uh those departmental scenarios where you're you're not running those massive warehouse type of workloads or queries um but it does have a a a fairly uh respectable um decor ratio that you're looking for for those applications um and an honorable mention here is the the storage optimized i would call this a specialized workload it's kind of a it's a it's a unique uh position as far as the vms that we're talking about here it's very high disk throughput in io really good for like big data etl scenarios like your operational data store scenarios or data warehouse processing where you could recreate the source data from like let's say an azure data lake and so forth why do i say that because the core storage for for this situation for this lsv2 series is ephemeral in nature meaning that if it you have a that that storage was to get de-allocated it would be recreated and the data on that storage would be lost so you're doing this because of the extreme high performance of that particular storage but you've prepared for that you're targeting this machine for those scenarios that i've mentioned um pam anything to add on this like we've kind of shrunk this down pretty tight anything else that uh that you would like to make not for this slide i think um i think this is a good segue into some of the other topics that we want to everyone perfect perfect okay so let's talk naming conventions really quick so we you may see some of these vms like that i had on the previous slide like you know eds v4 d-a-s-v like what what is this actually there's there's a there's rhyme and reason around this really it's the name is the family plus the subfamily we're telling you the number of cores that it has some of the key features plus the version what we're going to really tell you to pay attention to for sql is are these four okay do you have a local ephemeral disk present do you have a high end memory scenario is it premium storage capable i tend to look for ms ms is like what i'm when i'm looking at a machine if the is ms is the key one there especially for the um m series and is it isolated like you may have that scenario for certain uh security based situations high security environments so here's just an example the e80 ids v4 so it's e-family i have 80 cores it's isolated the local temp is present and it stores premium storage capable perfect tells me right up front what's going on here's another one and this is again that m series that i keep bringing up the m128 ms okay memory intensive and premium storage capable and then here's a very important concept here um that i want i definitely want everyone to understand is what we call um constrain chords and you'll see this dash some number it's going to be a half or the quarter of the parent vm this is this is specifically for a licensing scenario so in this case i have an m128 that's the parent vm dash 16 those are the cores visible to the vm that you're licensing for particular for sql server okay so let's talk about that optimizing for sql server licensing costs okay so now this is only available for the memory optimized virtual machine sizes and it's perfect for for sql scenarios so database workloads like sql server needs high memory storage and i o bandwidth but we don't necessarily need all that processing and these database workloads they're really not all that much cpu intensive we tend to be memory and i o intensive so what you can do is get a constrained amount of that vm again it's like a half to a quarter of the parent and what this does the key thing here is it reduces the cost of software licensing while maintaining the same memory storage and i o bandwidth of the parent machine okay um so now one thing to be aware of here is your the this is on the benefit is from a sql server licensing perspective you're still paying the same compute costs and windows licensing costs of the parent vm um so just just keep that in mind so this is specific for sql server licensing so david just to follow up on that one because this was one that kind of threw me for a question the first time you explained this to me like can you just tell me a little bit why i would what might make me choose you know like a 4 versus 16 and how's that gonna affect the performance yeah it's a it's a really good question so you would know you would have a good idea um when you're coming from on prem whether you're cpu bound or not normally speaking in most of most of the scenarios that i've seen most of the time you're either i o bound or memory bound now you could be now if you're cpu bound that would definitely be something i would want to test the other thing too to keep in mind is it's really easy to compare memory between on-prem and the cloud or anytime when you're comparing even machines on-prem but it is not so easy to compare processors so because of hyper-threading just different processor architectures virtualization stacks uh the pressure on the host if you're looking at looking for something comparing to something that you're doing on prem to the cloud you never want to compare core to core because it's not apples to apple comparison however that being said if you're seeing cpu pressure i'm gonna i'm i'm gonna i'm definitely gonna wanna test before i choose a lower core count like the example that you gave is a really good one like four versus the parent 16 if i'm cpu bound i'm definitely gonna test before um before i rush into uh deployment now that being said let me say this really quick i like the way you're you're positioning that because i'm gonna tell you that it is really important to get on the table okay now what do i mean by that when you're when you have let's say an m series okay and you go into the documentation um actually if y'all don't mind so if someone could drop the m series uh documentation down in the link so everyone can see what we're i'm referring to here um and actually it's kind of like what you see on the screen a little bit when you get to a position on the table your ability to scale up and down is going up and down the table so anna's question is really good is like so what if i have some pressure and and i need to move up for whatever reason whether it's cpu memory or io i can essentially scale up to those needs and scale down as i you know as my needs change it could be end of quarter processing it could be for any kind of seasonal scenario so that is one thing that is really important your storage stays the same but you're able to change that that compute as you need no that definitely helps and it's good to know that you know even if i size this wrong maybe it's better to size wrong on the high side and then scale down once i've done some testing yeah right right good point i'd love to add something specifically about the constrained cores as well um the important thing the reason why we have the constrained cores is that the when you if you need a large amount of memory and you need a large amount of i o throughput in azure um it it goes in lock step with the core count the higher the core count is the more memory and the more i o you get but for sql server workloads like we were saying before a lot of times your core count may be low especially for your on-prem systems on-prem sql server systems tend to have lower core counts but tremendously high io throughput and a large amount of memory those those that combination is not common in azure so in order to get that um that's where the constrained cores come in because the thing with the constrained cores is um that m128-16 that that david was talking about that has the memory and the throughput of the m128 which is huge but if you don't need 128 cores if your sql server only has 16 cores and you're fine with 16 cores you don't want a licensed sql server for 128 cores because sql server core based licensing is incredibly expensive so by going for only 16 cores in your vm you get that high memory you get that high i o throughput um but you don't have to license all those cores for sql server and so what this means in these constrained core vms inside the guest machine you will not see 128 cores if you're using one of these constrained core vms you will see the constrained number sql server will only be able to use those 16 cores but it'll be able to use all the memory and all the i o so that's where these come into play is you only need 16 cores but you need say two terabytes of ram or something like that that's where you get into these constrained core sizes yeah yeah to pam's point the only thing that changes is this right here is the yep it's the number of vcores gotcha cool that was helpful thank you all i think our viewers will find that helpful as well very good so what is this you know like with the constrained cores in mind what does this really boil down to now we're talking like the you know these are your produ you're going to production sql server machines again memory optimize um the memory optimized series uh the dsv2 um series of 11 to 15. this is got the same memory and disk configuration as as the d series the key thing here is a memo core ratio of seven across virtual machines now we'll tell you that we target eight you know that's that's where we want to be but this makes it's good for a departmental scenario it has premium storage and premium storage caching the constrained cores are available for everything in the vm set and the top end of these these machines has accelerated networking support which is really important um pam and i both like and pam this is like increasingly so right the eds v4 from just from from our experience in the customer experience these vms have a large local solid state drive capacity strong local disk support plenty of ram consistent quarter memory of one to eight across these machines it's perfect for your standard sql server workloads um it has premium storage premium storage caching uh the constraint core sizes are available for the higher end 64 and 80 core options and above four cores it has accelerated networking support as well so it's a really solid machine the m and the asterisk here is that we're talking multiple m flavors the m series m b two series and those two new new ones that are in preview the middle the uh medium memory uh scenarios these are high core counts and a lot of mem decor ratios across the board um your the standard uh m64 for example has a 28 mm decor ratio okay if you see here we have uh core to memory ratios of one to 30 without constraint cores 122 with constraint cores for these across the m series that that is really powerful so what we're talking about is mission critical data warehouse level virtual machine options and it's it's really straightforward what we're talking about for your key sql server workloads awesome this is this is really helpful i i especially like how you added the tags of you know this is better for entry level verse all-purpose verse you know mission critical you know really high uh throughput and that sort of thing workloads yeah and and and um and what's what's really nice about this is as you kind of go through the series and get an idea of what you're looking at and again position yourself in the table based on what you're you're obtaining on-prem for your what your needs are what what are you pushing on from a resource percent perspective what are your weight statistics saying for example and you know where does it hurt then you can kind of target the machine in the in the scenario that you have for for best performance and if you're in the azure portal you can also take a really good look um at what you're seeing there as well so let me show you a an example okay and what i'm going to do is like in the azure portal to get to get here all you have to do is go to compute go to your virtual machine and what we're going to look at is we're going to look at some sizing scenarios and i want it to be really clear how you can kind of get a really good grasp of what you have available for yourself here so all you need to do is say all sizes and if you notice here you've got a list you know saying like what's being used by most customers and some of the series that you have initially it's grouped by series so you can change the drop down and have no grouping but initially let's leave it okay so what i'm going to tell you to do is just click add filter and all you have to do is pick the family that you're looking for so we've been looking at memory optimized i mentioned storage optimized let's throw that in there too okay i'm gonna click add filter again and i told you it's really important to have premium disk support you may also discover that you need ephemeral disc support and we see that from time to time that is that's a scenario that is really strong and so here i've like really shrunk this down now you don't see the m series in my list because either you know i may have a policy that's not making it available i may have insufficient quota for a particular scenario so i'm going to do is i'm just going to do no grouping and i'm going to sort by the amount of memory and there you go again you even see the constrained course and if you and i'm not scrolled over i can move this over here there we go and then i'm seeing now this is us ease 2. this is an estimate in my environment but then you get a good idea of what the pricing is going to be and you can you can make your plans accordingly let me think about how how quick it's not very and by the way you can also adjust your cpu and your memory like you came in here and said look i know that i need at least 16 cpu i know i need at least a terabyte of ram okay what what do you have available to me in a few clicks you can narrow this down really fast to start targeting some machines that are going to be strong interest for a sql server environment yeah i think it's really powerful i mean i know it seems kind of simple to just filter but personally i've never filtered these options before and it really narrows it down took me from a list of like hundreds vm to you know less than 20. right actually it's it's the 20 to 30 is the is the window that we see for your sql work it's funny that you said that that is exactly says spot on okay all right so let's talk about storage now we're we're planning on doing a future session a deep dive on storage um and storage optimization really which what you're looking for from that perspective we're this part of our conversation is specifically around sizing so this isn't going to be like a deep dive in all things storage this is particularly focused on what you need to know in order to size what are your choices and what should you avoid what should you choose okay so first of all let's kind of like you know get to the heart of what we want to look at notice there's some things that are not on the slide i'm not standard spinning disc and standard ssds i'm not even talking about that here okay they're not even on on the on the screen for a reason um we'll tell you that from from a supportability perspective when those are chosen we tend to have a lot of customers have issues essentially and this that i mentioned throttling early you tend to have disk throttling very quickly when you pick the standard either spinning disk or solid-state drives so we're going to recommend premium ssds primarily and then ultra disk options for your your transaction log and use the local ephemeral disk for temp db in most cases okay um the premium ssd disk support and this you'll see like p1 up to like a p80 it's designed for for i o intensive scenarios um there's uh there's a lot of low latency support for the for these scenarios we're going to tell you to target the p30 in the p40 for your data disk because you need that caching support p30 to p80 for your log disks because you don't need the caching support for your log disk um i would avoid the p1 to p20 and by the way some i'm rattling all this stuff off if someone doesn't mind dropping the data um azure data file link or the data disk link that is in in the description that would be good so when i'm talking about the p1 to p20 you'll notice that that they support disk level bursting um that's great and it may be okay for devtest or some really really small workload scenarios the problem with bursting is it can lead to inconsistent experience and perceptions of poor performance so in that case i'm not i'm not a fan of using that in a production production arena so again p30 p40 for your data files p30 and up for your log disk and when you need more performance you're going to use storage spaces you're going to have this in a storage pool in order to expand your capacity and your performance and again i i'm going to recommend for the most part and there may be some caveats to this we'll get to this in in a future session on storage but for the most part target temp db on that a local ephemeral disk when you can and if you follow the storage configuration in the azure marketplace of when you're when you're creating a vm um these best practices are you're guided towards these best practices by default pam anything you want to add here before i move on to other topics nope we put the link up for you um for the disc type so that covers all the different details of the discs that david was just talking about and it also tells you about the different sizes that are available the the p30 the p440 that we were highlighting okay excellent so just this is again high level we'll get into this a lot deeper but let's talk about like the nature of i o requests um so the rule of thumb here is that old tp apps that generate millions of small random i o requests need higher iops where data warehousing generates large sequential i o requests that need more throughput okay so iops is the number of requests that your application is sending to storage in a second you can map iops to just reads per second just writes per second in perfmon we have a we have a slide on the counters here just in a second um but when you think of that high volume of io requests you're thinking about otp throughput is the amount of data read from or written to the disk per second this is all amount of what how much am i delivering right that volume and you can map throughput to disk read bytes per second and disk write bytes per second in perfmon now latency it you're going to measure latency in different ways but from a data perspective latency is the time fully it takes an application to receive a request send it to storage send the response back to the client think about this in terms of a full round trip response outside of the data layer you're probably going to want to grab your.net counters as well to get that full application performance experience but you generally measure latency in terms of application response and responsiveness in time milliseconds now that being said these are the core performance counters to measure application performance end to end the what we led with and what pam discussed early on is this it just could not be more important your i o and throughput you've got to get that in order to properly size um and you know along with your max memory uh needs and your in and if if you are cpu bound i'll say getting your iops getting your throughput and understanding in general what the latency looks like is probably the most important here yeah and and these counters um these are super basic i mean if you were doing a full performance analysis you would be collecting a whole bunch more counters but these are the core ones you need in order to correctly choose your vm size because like like david was saying earlier maybe you have 128 cores on-prem but your processor time is 20 don't buy 128 cores when you come to azure go smaller right so that'll help you the memory will help you but those the iops the throughput and the latency that is key for you to choose not only the right vm size but also the right type of storage whether you're going premium or ultra and also ultimately how many discs do you need to buy because what what dave is going to segue into next is how do we do the sizing not just for the quantity of disk you need in other words i need you know four terabytes of storage but the iops and throughput that you need is also um driven by which disks you choose yeah that's a that's a really good point yeah let's yeah let's move over there because that really gets into like like storage spaces in the storage configuration and without storage spaces i'm going and looking for a bigger and bigger disk and the price will will um it's kind of exponential in that regard so you can actually look at a scenario where you're moving like from a piece p 60 up to the p 80 and remember uh you'll see in the you may see in the link that was shared beyond uh four terabytes caching support is not is not there this is why we recommend the p30 and the p40 for your for your data files um take a look at the the price comparison in this situation now here's what we have here we're talking in terms of storage spaces and what storage spaces is doing is making it possible to expand capacity and increase beyond the individual drive one that's for iops and throughput as well um it protects from drive failures it's like a software raid and in the marketplace we make it very easy to do this configuration um to have the proper settings and optimizing for performance in this comparison we have four p30s in a storage pool the combined iops is 20 000 iops a combined throughput is 800 megabytes per second and the total capacity is four terabytes your compare that to a p50 where you're only getting 7 500 iops 250 megabytes per second and there's a difference of what 40 dollars per month so this shows you like really what storage spaces can do for you not only just giving you resiliency at your storage level but also giving you expansive performance iops and throughput um and and we make it again it makes it very simple now here's what i mean by simple yeah that's to me this is simple um all i have to do is is change the number of disks i can see what i'm getting here from an iop perspective i can see what i'm getting here from a disk perspective and at the bottom here if you exceed what the vm can handle it'll let you know that you're getting into a throttling situation in the storage configuration now this part is actually really interesting to me david and this is something you had both pointed out to me um a while back but not only is this going to kind of just configure those discs for you but it's going to set some really important settings for you that maybe you might not notice that unless you are an expert can you tell about like you know what are we doing under the scenes when we set set these discs up for folks yeah if you you pick the transactional processing template for example the allocation size or your block size is going to be 64k the interleave or essentially what the stripe set is going to be is a 64k the number of columns in this case will be eight meaning that the number of disks are going to be set appropriately for best performance that's done for you otherwise you would have to do that you would have to know what the powershell command is you would have to know enough about storage spaces to set that and to look for that specifically so feel free to go ahead and you know use this template set this up similar to what the way this is here um and go check uh go on the machine and check your your interleave your block size check the number of columns in on the os and you'll actually you'll see that it's optimized for for your experience and also notice here at the bottom your temp db is targeted to that ephemeral ephemeral disc by default you can of course change it if you feel uh if you see fit yeah one thing that's pretty cool the best practices are built in sorry pam go ahead best practices are built in yeah yeah one thing i wanted to point out is that it's general rule of thumb when you're putting together storage in azure that a large number of smaller drives is going to be more cost effective than one single large drive and the the what storage spaces allows you to do is to boost that i o throughput by putting those together now the thing i mainly wanted to point out is that these are managed disks in in azure which means that all the redundancy you need is built into the managed disk service itself so when you build that storage pool there's no need to do like like we were saying it's software raid so you could do mirroring to get extra security to protect against drive failures but you don't need to do that here because all of that redundancy is built into the managed disk itself so the only thing we're using storage spaces here for is to be able to stripe across all the drives and so that way you get the best i o throughput so that's where that column count comes in you want to make sure that the column count matches the number of drives and that means we're striping across all the drives evenly we're not using any mirroring or parity we're just doing a straight uh you know simple stripe set and that's going to give us the best performance we can get so so that's um so that's one that's one of the reasons that storage config is important is because you don't need to know any of that you just say how much how many drives do i want how much io and throughput am i going to get and we'll do all that configuration for you right awesome um we did get one question in i'm just going to toss it up on the screen it's from rao he says the d drive tempdb 2300 only iops any insights on what rao might be asking about yeah i think um so depending on the vm size um the local ephemeral drive will have different capacity so this is one of the reasons why we were saying that that eds v4 series is great because the eds v4 series has an enormous amount of local throughput so um so i think it's the eds the eds 64v4 has these are the m series ones that you've got on the screen yeah oh but yeah there you go so what what david's showing here in the in in the uh documentation is the maxed cache and temp storage throughput this is how much the temp storage is um is where that ephemeral drive is so that's the cache size and then there's a temp storage size if you can two columns to the left of where you're highlighting david is the temp storage size so that'll show you what the size of that local drive is and then the cached uh temp storage throughput that will tell you how much throughput you have against that local drive and that's also combined with the cache because the and david's going to talk more about this the cache also uses that same local drive in the host so so depends on which vm size you choose so the different vm sizes will give you a different capacity for that local drive so if you stick to the series that we were recommending those will have generally a high i o throughput against the the local drive and also a decently high capacity yeah so he said he selected e32 16 dsv4 yeah so his might i don't i don't know what the specs are for that off the top of my head but that may have different e32 yes yeah that would be this guy here oh so yeah maybe yeah you see it you know actually got a pretty decent throughput but it's only got 1.2 terabytes but still that's actually that should actually be pretty decent i would say for a local throughput cool awesome well thanks and rao hopefully that that helped you kind of navigate at least see the mind process as well is how they look at these vms in the specs all right so let's get into you you you heard us mention a couple of times the throttling concept let's talk about a couple of these scenarios and this really comes down to like really how this performance works okay so application i have my vm you see it's a d8s v3 and i have some disk down here so virtual machines are going to have eye up and throughput performance limits based on the vm size and all the disks are going to have their own iops and throughput limits also so that means that you could be throttled or capped at the disk or the vm level okay now when capped this is where an application may experience sub-optimal performance okay so consider the example here so this is a disk scenario we have that d8s v3 this machine has eight core 32 gigs of ram the key piece here and i'm focusing on iops is it has twelve thousand eight hundred iops okay so this is this is an old tp application right um now notice underneath the key thing here is that i have these e 30 discs that in in this case it has a 500 iops and up to 60 megabytes per second if this app is asking for 10 000 iops okay and let's just say it breaks it down this way it doesn't necessarily have to right but a thousand iops are going to the os and then we're splitting 4 500 to the other disks here's here's where the problem comes in um i i never talked about these e disks right well there's a reason for it and because these are standard disks and standard disks unfortunately are not what i would choose for a sql server especially something in production and you can see why it is being limited and we're only turning turning back to the application fifteen hundred iops of the ten thousand it needed it wanted i can most certainly expect that the users are going to notice this and this is what we would call a disk throttling scenario so how do i solve for this it's easy upgrade your disk go to the p30s and in this case these p30s can handle up to 5000 iops so no problem whatsoever i'm going to get back exactly what the application wanted in fact i could handle up to 15 000 but the question is could the vm now i i'm there's an important point buried into this next example i want you to think about um the application in this case is going to do exactly what i suggested it saying hey we had it we had a pretty good year our service is fantastic our customers want more and they're using more so we've gone from ten thousand iops to fifteen thousand iops because business is good still have my d8s v3 and in this case i'm breaking it down by five thousand iops os and my two data disks and now i'm returning back that fifteen thousand the problem is this cap this time is the vm could only support twelve thousand eight hundred now here is a question that i'm going to ask y'all is that enough throttling for me to decide i need a bigger vm this again i'm going to go back to the point that we made almost 45 minutes ago i guess this is capex versus opex if the users are not seeing this as a problem this may be fantastic in fact my organization may say you know what this is exactly the best use a great use of a azure virtual machine giving the business exactly what it needs without us having to spend more than it's necessary my point is it's not all throttling is bad does it affect the user experience and that's key so i have these conversations all the time a dba what is a dba going to do when stuff moves to the cloud they're always going to be tuning and i would say even more so in the cloud because this is an operational scenario you're tuning you're always tuning always looking for the best value to me if the user is not experiencing a problem here this is a great value scenario all right so any any questions there and i spent a little time on that slide more than i normally do but i think it's an important point um any any questions any other things pam or i was just i just typed in the chat abt always be tuning yeah always be tuning i'm gonna change my name obviously so this i had this this vm pressure scenario um how do i get around that now we've been talking about uncached iops and throughput this whole time there's a complete separate bucket and this uh pam touched on this when she was talking about tempdb tempe b is in this bucket too if you can see cached and temp storage throughput and then the cache size okay for that d8s v3 that we've been looking at okay this cached uh scenario for my data files is not of available to me unless i've either either specifically enabled read only caching for my data files or i have followed the storage configuration in the azure marketplace to make sure that i'm following best practices so if you have not done that and i can tell you pam and i've we've both seen it where we've had scenarios where customers roll their own scenario own environment they haven't checked and they're just not getting that cash support like you can get like 30 up to 30 or more benefit from having caching caching enabled so how do you do that um again in a future future session we're going to get really deep on this but how do you do that like in a screenshot you go to your in your vm go to disks and what you want to be looking for is to make sure your data files are set to read only in fact read only is the only thing you're ever going to do and it's only going to be for your data files never change the log files just put it to none okay definitely don't do read write because that could lead to corruption set it to log files should be none data files read only if you happen to put tempdb on on something other than the d drive then perhaps perhaps read only for that as well but read only for your data files none for your log files and i don't recommend messing with the host caching for the os okay all right now what's the benefit of this same scenario before where we had the capping let's say that instead of 15 000 iops it was more or less say that that customers are noticing hey like this is kind of a bad experience i'm not i'm not overly impressed with what i'm getting here can you help me out yes so that now let's say we're at 15 000 iops we've bumped up a little bit in there's like customers are noticing an issue we have enabled caching like we had in the previous slide now even though the vm can only support the from an uncached perspective 12 800 iops the cash support is 16 000 iops so we are able to handle this scenario and return back to the application the full 15 000 iops requested we have no vm or disk capping at all and guess what guess what there's no cost to doing that there's no cost to enabling it's part of the vm in fact i would say there's a cost to not enabling it because that is a good point it's for free it will get you a huge amount more bandwidth that you're not going to get there's no reason to not turn this on for your data drives read only caching on all your data drives yeah and just because it looks like i think we're gonna have a little bit of time to do this let me let me bounce over and just show you what it looks like when i click disks in the portal this is what we're talking about david there's one thing i want to mention while you're pulling this up we had a question in the chat about monitoring for for i o capping um and so i dropped in yeah what's the best way to monitor throttling and so i dropped into our chat here i don't know if you guys can um surface that link to the um storage i o utilization metrics we do have metrics in azure um to measure your i o throttling at both the disk level and the vm level um and so you can um you can use azure monitor to to set up those metrics if you want to you can configure alerts if you like um i would give the caveat here that remember if you hit 100 consumed bandwidth it doesn't necessarily mean it's a big performance problem but if you're consistently at 100 for a long period of time chances are you need to look at the vm or or the data yeah and you pam you can see when we we're often where throttling will kick in we just see we'll just see those um those charts completely flatten out right and it's really obvious like oh yeah you're hitting that 800 megabyte per second limit for example yeah yeah like a great example would be a backup for example let's say you take a nightly full backup chances are that's going to push your i o over the top and and maybe you'll experience some throttling but if your backup is still getting completed in a timely manner and you're not seeing any performance issues it's okay to hit that cap it just means your backup will take a little bit longer but if you're in a severe circumstance where that backup is now taking like 10 hours instead of two hours then that's something where you maybe want to look into going to a burstable size where you can burst for that backup or or just going for that higher vm yeah and it's one of those business decisions is it worth it to move to a bigger vm for that scenario backups you know that's it maybe um etl maybe right um so you know that's it's that point where you have to measure and it becomes that business decision um so okay so this is what we mean by you know looking at your host caching and making sure that this is uh this is set appropriately i didn't mess with this directly this was a vm that i built through the marketplace the marketplace will will take care of those uh scenarios for you okay and you notice one of the drives is set to none i'm assuming that's your log drive david absolutely yes yeah okay so it's important to do that to none for the log drive because uh all rights in sql server uh use a flag that forces a right through to the disk so all writes in sql server are not able to leverage that cache because we have to guarantee resiliency so we have to guarantee it goes all the way through the disk so no right caching for sql server at all either on the data drive or on the log drive because of that right through right cool this is this is really interesting i mean it's a lot of information but it's also very interesting to see how these different scenarios play out and see how you know at every step we're trying to make it easier because i think you were saying you set up that host caching just using the portal yeah if you go through that marketplace experience and especially with the uh transactional processing template um it's really it's very very good performance that i've seen um especially targeting the machines that we've mentioned previously that are right size for like sql server scenarios yeah so this is like a really good like segue like you know summary and then we can take any kind of questions so the key things here are to um choose the right vm um and optimize the storage you know look at that the m series the eds v4 the ds11 through 15. those are perfect for your for your memory optimized scenarios you have that lsv2 for your for the specific storage scenarios like your your operational data store situation uh your data warehousing uh scenarios uh choose the right disk um premium p30 p40 for your data files your your your log file you're not going to set up caching as as pam mentioned so p30 and up to stay away from in my opinion stay away from p20 and lower because of that bursting situation it's just unpredictable um it's it's a credit based system we didn't get into this a lot earlier we will when we when we do our storage deep dive it's a credit based system so you'll have 30 minutes of the of of higher performance but then the credits you know replenish over time so um it's it's not a consistent situation that you would see for production uh situations isolate your your files data temp log by the way again with the the marketplace uh storage optimization that is done for you so we're telling you you know the d drive has got the caching support it is separated from the from from the log files tempdb is targeted to your ephemeral d um by default if you want to change it for you know you've tested and found that it that in that particular case um that you you want to put it on um on you know premium disk remote storage you can um but by default it puts it on the ephemeral d which for the most part is the approach that you want to take again we've mentioned enabling host caching for the data files um um optimizing your configuration when you know when when you when you need to take your a manual approach or or you're building it your own outside of the template make sure that you that you're looking for your uh block size you're looking at the inner leave you're looking at the number of columns that pam mentioned before we do that for you in in the marketplace uh templates but um if you're doing it you're on your own you definitely want to check that again i mentioned leveraging the d drive the ephemeral disc right acceleration we'll get to more on that when we do the our storage delivery but i'll say this is that on the m series write acceleration is really good for the for your log files um i would say it's a better scenario than choosing ultradisc in that if you're on that m series um and it's it's perfect for those scenarios where if you're getting a lot of right log weights or you just happen to be putting a lot of pressure on the law on the log because of replication of change data capture maybe there's a lot of pressure from multiple uses on top of like let's say always on availability groups and so forth so right accelerator can help you in that in that regard and of course as pam mentioned leverage azure monitor leverage azure metrics it's awesome being in the cloud and being able to take advantage of what's in the azure fabric to just improve your overall experience right including azure security center and azure advisor and all those other wonderful things that you get just because you're an azure that's actually a great segue into a question that we got in the chat um which i thought was really interesting and it was um i know microsoft has to push azure but can i get an honest opinion on which is preferred uh on-prem versus cloud i i really think that i mean it's kind of an existential question for us but what you touched on there david things like security center things like the storage configuration things like azure monitor and we've got a whole host of new features one thing we didn't talk about today was the sql as agent extension which is the the value add that you get from running sql server and azure there are some great features in there as well and more coming we're innovating constantly and adding new features that that we're bundling into the service for free some of them are additional cost but some of them just come for free um and so there's value add to running in the cloud there there's definitely reasons to stay on-prem too and david keeps kind of throwing out this capex versus opex it's really a different way to look at spending as well but we're finding for customers who can actually turn off a data center in other words they can take an entire data center and move it into the cloud they're actually finding it quite cost effective um you know roi wise to to run in the cloud if you have one foot in on-prem and one foot in the cloud the value proposition may not be as high but but there is definitely some value add to running in the cloud and so it's a it's a decision that every business kind of has to make on their own but um but you know we know how to run sql server because we've been building sql server for you know over 20 years so the the things that we're doing in azure we're we're directly targeting how how a dba you know what sort of features would a dva need to successfully run sql server on the cloud so that's kind of the benefit you get of running on azure and part of that azure fabric too on top of everything that you mentioned is the the machine learning and analytics on your own data store like to be able to tell you like hey we're seeing a potential vulnerability or hey this looks like a sql injection attack this looks like a risk to your environment that kind of stuff is being surfaced to you via security center via advisor but i don't know of anything like that on prem um so just part of that you know just the the fabric of this hosted azure environment you're getting benefits from there as well so if you're not familiar with like kind of those capabilities in the in in the cloud that you know when you're comparing it to on-prem some of those you kind of miss those are those like little nuggets of wonderful things that you find as you start deploying vms and in connecting especially like you said especially when you're comparing like a hybrid in a hybrid uh perspective what i'm doing i'm gonna cut you off there david sorry we are at time for today all right then a wonderful episode we're gonna have to have david and pam back on the show to dive deeper into other sql vm topics you know david pam thanks so much for joining us today for our viewers uh thanks for tuning in if you like this video please like the video subscribe to our channel and you can get the latest episodes every week here on wednesdays at 9am pacific thanks so much and we hope to see you next time on data exposed thanks take care [Music]

Info

Channel: Azure SQL

Views: 1,000

Rating: undefined out of 5

Keywords: microsoft, developers, azure sql, azure sql virtual machines, azure sql vm

Id: MeuegGS2HEU

Channel Id: undefined

Length: 61min 2sec (3662 seconds)

Published: Wed Mar 10 2021