Azure Managed Disks Deep Dive

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone in this video i want to dive into managed disks the different types available the performance characteristics how we use them with virtual machines bursting and how to monitor what's actually happening as always if this is useful a like subscribe comment and share is appreciated and hit that bell icon to get notified of new content and when i do live events now before we get started i actually want to run a couple of commands that really set up a demo for later on so i'm actually going to switch over to a remote machines this is a virtual machine running in azure and what i'm going to do initially is super quickly i just want to run this command for 10 seconds so i'm going to run it for 10 seconds and it's just doing a performance test again i'm going to come back to this later on explain exactly what it's doing and why we care but i just want to run it right now before i do anything else and then what i'm going to do is once this 10 seconds has completed and we can see there's kind of a bunch of output from this kind of execution we're going to come back and look at that later on but now in the background while i'm talking and doing the rest of the presentation i'm going to have it running for 2000 seconds so about 33 minutes so let's kick that off and that can have some good fun in the background so while that's running let's get back to kind of the goals for this session now in the days of old before managed disks we had the idea of unmanaged disks and i could really think well what i created was a storage account and remember that storage account has certain properties a number of iops it supports etc and what i would then do is within that storage account i would create a page blob and then within the page blob we would actually create our vhd file so that's what was happening behind the scenes now that individual disk would have a certain set of performance characteristics like 500 iops per second the storage account would have its own set of limits and so i had to be super careful that i didn't put too many disks running at the same time on a storage account or the storage account would actually become the throttle and so this was solved with managed disks so the idea now was hey we have this great managed disk construct that hey here it has limits based on for example the type of managed disk i create but because this became a first party armed service it added things like well i had role-based access control i could do things like snapshots we had capabilities for example like images were now kind of a first-party service it could kind of integrate with availability sets and availability zones to kind of spread things out in a managed way and one of the big changes that did happen is in terms of paying for the service what i pay for over here is always the provisioned size so the provisioned capacity whereas in the unmanaged if it was standard i would pay for the used capacity so there's a difference here if i create kind of this managed disks and it's a hundred gigabytes i pay for 100 gigabytes doesn't matter if there's 1k or 100 gigabytes written to it i pay for the provision size and really what's happening with this managed disk is it's just doing an abstraction this is still here but it's abstracted for me so i don't see the storage account i just see this first party service now if we really dig around if we're nosy we can actually still see that there's a storage account behind here if i jump actually over for a second and we just look at a little bit of code well firstly i could kind of look at all of my disks i'm just doing a get a z disk i'm looking at the name the resource group the sizing gigabytes and then i'm actually using an expression to look at the sku.name i'm going to store that as disk type so if i run that command we can see i've got a number of different disks at the bottom here i can see the resource group i can see the size and i can see the type so that's what i see but i can actually get a disk access shared access signature for example i might want to use that for copy operations so using this grant azdisk access command here i'm going to ask for a token it's going to last 60 seconds i when you see this token it's going to be useless to you i'm just doing it as a read token but if i execute that remember this would be useful for maybe to copy a managed disk around maybe copy into an empty managed disk etc but what i'm actually going to see once this comes back so it's going off and actually granting this this is a disconnected disk this is a lonely premium disk you can see from the disk name prem disk lonely it's not actually connected to anything this will give me the shared access signature and the shared access signature well that will actually show me the storage account name so if i now go and look at disk sas well you can see right here there's the storage account name that's being used behind the scenes md dash kind of this random string blob corewindows.net so we can actually see that there's still a storage account behind all of this it's just abstracted away for us so we like dealing with these managed disks now if i think about a disk there are different characteristics we have to think about now it's very common when i think of a disc i automatically would jump to or capacity we might think of that in terms of kind of gigabytes or we're actually going to use term gibby bytes so you'll see in the azure kind of documentation it's always this g little ib so that's gibby instead of giga and really the difference in all of these things is strictly speaking a gigabyte is times a thousand whereas a gibby is to the power of a thousand and twenty four and when you're talking about k like kiwi bites maybe bytes gibby bytes tebby bites pebby bites as you get bigger it actually does make a difference and that that's why you see this gi be an mib and tib uh maybe a little distraction a little bit of fun for a second if we fire up the calculator and if we look at this so if you think about what normally happens with let's say a gigabyte or a terabyte actually is more fun which would be kind of a thousand to the power of four thousand times a thousand times a thousand times a thousand so if we did a thousand to the power four that's the size of a terabyte now if we think about uh was it a tabby byte make sure i'm saying it right so if we now do a tabby bite which is 1024 to the power of four well it's actually if you look at that size it's nearly 10 bigger so when i'm very small in the kilobyte range or the megabyte range it doesn't make a big difference as you start getting bigger and bigger well that difference does actually start to count so that's why that's why we see this gib it's a gibby buy a tippy bite they're showing you the true size because they're counting 1024 not 1000. anyway i digress a little bit so we see capacity so that's really well how much i can actually store now the next thing we generally would care about is iops so this is kind of the input output operations per second now you might also see throughput now they're both super important so these are all about really the performance that i'm going to see if you think of an operation is a particular something i'm doing i've seen the analogy of buckets if i had four buckets i can fill a second that would be kind of four operations a second well the throughput depends on the size of the bucket i can do four operations but if it's a really big bucket the throughput is going to be more for the same number of operations than if it's a very small bucket or a cup so we think is the throughput is really based on the number of operations i'm performing and the size of the operation so sometimes we care about the number of operations maybe they're small operations sometimes we actually care more about the throughput it may be a smaller number of operations but they're very big we're dealing with big amounts of data in every operation maybe talking megabytes instead of kilobytes so then the throughput actually becomes more important so it really depends on what i'm doing now there are other characteristics am i doing read am i doing right am i doing random ios am i doing sequential ios that will impact the performance the other one is actually latency now i can see we typically measuring kind of milliseconds so that's the idea that hey i issue some operation from the compute resource how long does it take to maybe get to the storage get action by the resource the storage and then acknowledge back to me because obviously this latency does have an impact on that performance so we have to kind of bear that in mind so we understand there's kind of these different elements now before i can go and pick the managed disk which we're going to get to i really have to understand my workload now i have some workload we have the os disk but our application shouldn't be running on the os disk it has data they'll be on data disks i need to understand the characteristics of that storage interaction i need to understand kind of the average and i'm really thinking about performance here i need to know the amount of data obviously i have to have the capacity but what's the average kind of iops what's the average throughput and then what are the peaks what does it maybe spike up to and when i think about those peaks well what is that peak so what is kind of the value what is the duration so how long does that peak last is it during a startup of the machine and the peak lasts for a minute or 30 seconds or is it a main controller and i have a log on storm for about 15 minutes and then i have to understand the frequency often is it doing this peak because again it comes into when we start to look at the discs there's different characteristics so for all of these workloads to help start driving decisions i need to know i have to understand my workload so before we start trying to pick disks or anything else i have to have this good view of my workload hey look there's the average maybe my average is a hundred i'm just talking about ielts here for a second but maybe it hey it shoots up to one thousand and it does that for maybe five minutes and then it goes down again and maybe it does this every four hours it has that peak you need to understand those characteristics because it's going to really come in when we start trying to pick the most optimal storage solution yes i could go and pick a disk that does 1000 all day long maybe i don't need to maybe there's other ways i can really optimize how i'm spending my money how i actually pick what workload i'm actually doing so with that said we're going to talk about the different types of managed disks that are available in azure and we saw this kind of capacity iops and throughput and there are different sizes in all of these different families and what we're going to see for the most part is i can really think about hey we have this chart and i can think about capacity on one axis and then kind of iops and throughput on the other now obviously they're different values but what we're going to basically see is as the capacity goes up so too do the iops and the throughput they're really kind of a linear progression for the most part so when i'm thinking about picking what disc do i want i may pick based on the amount of data i need to write sure but maybe not maybe it's more a case of hey i have these high iops or higher throughput and so really the capacity i pick is not because the amount of data is because i need to get to a certain iops or through port for my workload so it's really super important to understand that and we can see these so if i jump over for a second and we're going to cover these different types of disks but if i just look for premium ssd for example we can see here well sure look there's different sizes so we have this disk size in gigabytes and it kind of goes up for the various disks and what you'll notice is as the size goes up so too does the provisioned iops per disk once we get past kind of that p6 so too does the throughput so the bigger the disk the better the performance and it kind of goes linearly we see it really the standard ssd we don't really see that same kind of thing on the lower levels when you start to get up to the really big sizes then we see they start to actually give us more iops and throughput and then the standard hard disk drive once again we only really start to see the performance get better as we get to the really big sizes of standard hard disk drive but it shows if we want the higher performances well then we have to kind of go and get those bigger discs to get that performance so let's actually talk about the types of discs we have available to us now the most basic one and we'll kind of go in order is the standard so we have the standard hdd and as the name really kind of suggests to you so this is based on hard disk drive technology so this is obviously kind of the lowest performance so it's the lowest perf but it's the cheapest oh it's the old kind of added you get what you pay for and what we'll see is as we kind of go up through the different types the performance will go up and the cost will go up now a key point here is the performance that we get is not guaranteed i.e what it shows you is a maximum it says this is the maximum you will see but it's not provisions i may not always see that number and then from a latency perspective the numbers they typically give is kind of a 10 millisecond latency for a right 20 milliseconds for a read so we get these different numbers actually for our workloads so we would really say this is kind of non-critical i'm playing around i need some disks i don't care about performance i don't care about consistent performance i really want it kind of as cheap as possible and we can kind of see that so if we go back to the documentation we look at the standard hard disk drive if we look at its metrics notice the word it uses up to 500 up to 60 megabytes per second so it's really important to understand that word it's an up to not a provisioned and here we can see it talks about those latencies right latencies under 10 milliseconds read latencies under 20 milliseconds for most operations again this isn't kind of a guaranteed but that's what we're really going to expect for the most part okay so then we get to standard ssd so now i think okay so now we have standard ssd now obviously here we're going to see kind of higher path this is just going to continue to go up but obviously it's going to cost a bit more so this cheapest is going to get it's a bit more expensive but once again the perth is not guaranteed once again we're going to see that kind of hey up to statement now what we do get is this nice free credit based bursting so we get this non-guaranteed performance but we actually do have the ability to go above that within this kind of bucket of burst capability now i'm going to talk a lot more about this at the end that's why i'm kind of run that command to try and get some kind of things running in the background but we do actually get this credit based system where hey i can kind of like my cell phone if i don't use all the minutes i can carry them over and then use them in this kind of bigger bang way now for the latency here what we're really talking about is single digit millisecond so it should be less than the 10 it's kind of it's going to be better than the standard hard disk drive so i'm going to get better latency workloads again we're really talking about low storage requirements um probably maybe little basic web servers that aren't doing a lot potentially but still i'm not going to use it if i ever need kind of some guaranteed type performance i'm just not going to get it when i'm dealing with the standard ssd and once again if we bring it over to the browser once again we can kind of see that we can see it say compared to standard hard disk drives the standard ssds look at this statement here better availability etc etc better latency but again the key word we really care about looking is this statement around up to it is not provisioned again but it does say hey look at this max burst i get this bursting up to 30 minutes if it's kind of one terabyte or smaller and again i'm going to go into detail on that okay so now we get into premium so premium ssd is really where a lot of companies will start when i think about production workloads so again the performance is just going to keep getting better so again higher perf again over kind of here and again the price is kind of going up now one difference here today for the premium remember we talked about i basically pay for the capacity and i get a certain performance that aligns with that capacity one of the things i've actually done with premium is that still happens so when i pick the skew of my desk i pick kind of the capacity skew so i pick a certain size disc and i get a certain set of performance but it's actually possible to also specify a different performance sku for my disk a higher one and the idea here is let's say i pick a disk for size i want i need 12 gigabytes for example but maybe i have a fairly long duration where i need higher performance maybe i'm doing some big import it's going to take two days well i can actually increase the performance of the disk and then once that job is finished i can bring it back down again now the whole point of this is i can't just constantly be moving this back and forth back and forth back and forth there is the idea that i can only do this i think i can lower it every 12 hours i can downgrade that performance tier but this is really useful because i don't want to increase the size of the disk if i increase the size of the disk i can't shrink it again so this is hey look normally this capacity is absolutely fine the performance is fine if i'm doing this big job where i need a higher performance for a limited amount of time i can separately raise the performance of the disc without actually having to make it bigger because it might be stuck i couldn't shrink it again so if we jump over to the portal for a second and just to kind of see this in action now ordinarily in ga to do this i actually have to disconnect the disk so it has to be either disconnected from the vm or deallocate the vm but in preview it actually lets me do this dynamically so this is a premium ssd you can see that over here premium ssd and if i go to size and performance what's kind of special here is the primary thing you pick is the disk skew so mine is a four gigabyte so i get 120 iops 25 megabytes per second but if i scroll down i can actually pick a different performance tier so i could leave the disk at that size but maybe i need so i need 5 000 iops and 200 megabytes for the next 24 hours so i can actually do that i can separately change the performance without having to resize the disk now what am i going to pay for i'm going to pay for whichever one is higher which is always going to be the performance sku i can't make it smaller than the disk so for the period of time where i increase the performance sku i'll be paying for a p30 even though the capacity of the disk has not changed but then once i finished hey i would change it back down again i think there's a period of time it takes to actually take effect i think it's within an hour or maybe it's quicker than that but i can actually go and change that performance so that's kind of something unique actually about that disc so let's go back over again so at the separate capacity and performance now we are now getting into provisioned performance it is not up to if i look at the words on the site for this disk what it now says is provisioned iops per disk provisioned through put the disk that is super important this is basically now guaranteeing that is actually the performance i will get so if i need a guaranteed performance consistently well that's where i get into premium ssd world also so it's interesting i get bursting but there's actually two types yes i get the credit based so once again i get that kind of bucket for up to 30 minutes and once again that's free but there is also an on demand option and that cost me money that cost me money to turn it on that cost me money when i actually use it so this is designed for longer bursting so maybe those little buckets at 30 minutes is not enough i need to burst for longer so with on demand i can turn this feature on and then whatever additional performance i use i'll get billed for now the way they're actually doing this is this credit based i'm going to use if it's less than a p30 disk which i think is 512 gigabytes this is for p30 and above so if we look at the site for a second so we hear less than the p30 is a p20 and we can see yes this p20 has this idea of a max burst duration but then the p30 and above does not have the that option but instead it does have this kind of on-demand bursting so you can see it talks about here the different burst options so this automated credit system and we have this max option this is always turned on by default but then the on-demand bursting if we actually jump over to this page this is about managed disk pricing looking at premium disks and here it says the bursting so for a p30 and larger so you turn on a monthly enablement fee to enable the bursting then a burst transaction fee of half a penny per ten thousand ios for additional iops so you then actually go and pay if you want to use that but again the p20 and below you get that free kind of credit bucket just like anything else in the documentation i'll add this in the description talks about how you actually enable that on-demand bursting through powershell cli etc so i will have all of that kind of in there so as we can kind of see for the the premium ssd we have a lot of options here so we get the provision performance i can separately change the performance sku but i also if it's less than a p30 so 5 12 gig or smaller i get this free 30 minute bucket all i can do on demand bursting and you might say which which one do i do and it's really about how long is this going to run for if i need this higher performance but it's still fairly limited in time i could do the on-demand bursting but if it consistently needs a higher performance than what i get with the capacity it will be cheaper to just change the performance skew that will actually give me a better overall performance latency is that kind of single digit millisecond and so really what we're thinking about here is this is kind of production these are production where i need that kind of guaranteed consistency really of my performance and then finally we get to ultra ssd so this is the highest performance this is actually very unique i have skews in really all of the other things and these all based around capacity this has capacity first but also performance what this has is i have capacity i have iops and i have throughput i have three dials and i pay for each of kind of how i set the various dials the iops and the throughput i can actually change dynamically any l's dynamically dynamic um so i could think about hey i have the workload and hey i have a busy time overnight let's increase the i ups on the throughput and i'll pay more and then that job is finished let's lower the iops on the throughput again to optimize my spend and we see this in the pricing page so if we actually go and look at the pricing and let's find ultra disk probably all about the bottom here it shows me an iops range but as soon as i essentially hit the one terabyte i can go up to 160 000 iops and 2 gigabytes per second and notice how it's charging me i pay for the capacity a certain amount of money i pay for the provisioned iops certified money and i pay for the provision throughput a certain amount of money and if i have a vm configured to use ultra and i don't connect an ultra to it there's a certain charge for that as well so i have the dials that i can really change at any time in my little sample code file actually let's jump over for a second to my file i don't know what it's doing anymore it's doing something very strange let's go over here this is an example of changing it dynamically so this disk could be connected to a vm i create a new config and i have this new az disk update config where i set a new iops value then i update the disk so this disk could be connected to a running vm and i'm just dynamically changing the iops actually for that virtual machine so it's really powerful how i can actually use this so that's kind of the the ultra disk now in terms of once again it's provision performance so it's kind of guaranteed it does not have any bursting and so there's no bursting but you could kind of argue it doesn't need it because i would just dynamically change the iops and throughport as needed so who cares latency so this is where it gets kind of crazy latency is sub millisecond so we're talking less than one millisecond so in terms of use cases this is going to be kind of databases so i have databases um super high storage puff and that's really where i'm going to be using these so we have these different types of disks and kind of to to summarize the idea obviously as you go that way come on the perf gets better and the cost gets higher so kind of summarize how those work in terms of just a few other little features i guess i should mention um all of these are lrs so they're all locally redundant storage so there's three copies of all of the kind of blocks you write standard ssd and premium ssd also support zrs where i actually have there's still three copies of the data but they're spread over availability zones another feature that may be of interest potentially is both the premium ssd and ultra ssd can be shared i i can actually connect them to multiple virtual machines that has though have them actually come over as looks like a shared disk so maybe i have a cluster or something else it would look like i support scuzzy persistent reservations for example i can actually use those as shared disks so those are essentially my options i have these are the types of disk and you can kind of see yes performance goes up cost goes up i get some nice bursting capabilities with the ultra i get this kind of dynamic capability latency improves as we go down so i'm going to pick based on my workload and that's really kind of the key point on these things when i change the ultra there is a delay when i would actually make that change i could do it by a function a logic app i could schedule it i could trigger it it can actually take up to an hour it shouldn't do but it could do and i think i can resize four times every 24 hours one other comment before we start looking at some of the the bursting things we talked about kind of the capacity of the disk when i create a managed disk i pick a size now let's say i pick the disk of six gigabytes well there isn't a six gigabyte premium ssd so what it will actually do is it will from a billing perspective i'll pay for the next size up so if i look at all of the disks that are available for a second let's go back and look at premium for example so if i create a six gigabyte premium managed disk well it's gonna bill me for an eight gigabyte a p2 so just realize there aren't any kind of in betweens of this if i create a six gigabyte disk i'm gonna get charged for an eight gigabyte i will get the performance of an eight gigabyte as well so it's really gonna work that way so if i think about okay i create a six gigabyte what i'm actually going to get billed for is kind of the next one up so in this case it would be a p2 so that's how the billing would work which is obviously eight gigabytes i can resize disks so i have to either disconnect it or de-allocate the vm it's connected to so i can increase the size of a managed disk so i can make them bigger this might be very common that when i create a virtual machine often i'll create the virtual machine from an image well that image is a certain size the disk that gets copied will be the same size maybe i need it bigger though so i can increase it so increasing great i can do that um i cannot decrease i cannot make a disk smaller i would basically have to create a new disk and copy the content over so just keep that in mind and that's why premium lets you have a different performance tier because hey maybe i need the better performance but i need it for only a short amount of time if i resize the disc i would be stuck because i could never shrink it again by having that separation of the performance and the capacity hey i can kind of solve that problem remember if i make a disk bigger that's making the disk bigger in azure within the os that's using that i would have to go and increase the volume size to use that new space or create a new volume on the space that's left at the end speaking of which this is all about the disk great discs have all these kind of attributes this performance but most likely i'm connecting this to a virtual machine now that vm itself has attributes now this vm could just be a regular virtual machine it could be part of virtual machine scale set it could be part of an aks worker node it could be running a managed database it doesn't really matter now the vm has attributes like hey number of virtual cpus amount of memory but we're really focused on the storage side so i think about well it has a number of disks i can connect to it and then it has an iops it has a throughput limit as well so i think about well okay i connect obviously an os disk which that os disk remember has a certain number of iops and throughput and maybe i connect data 1 and data 2. each of those has an iops and a throughput maybe they support burst depends on the type of disk the vm actually some vms actually have burst as well i think the lsv2 the dsv3 and the esv3 they support bursting and that might change over time so make sure you actually go and check hey what supports burst but this is kind of super important and also there's temporary disks so there's kind of a temp disk over here that has certain characteristics and some of them optionally in fact even the temp disk is optional now some of them have like nvme ephemeral like locally attached storage which has phenomenal iops and throughput where i need local performance there's also network and that's kind of important to understand because network is really all based around a certain amount of kind of megabytes per second that it supports and you care about that because things like smb nfs even kind of iscsi go over the network of the virtual machine it's not using its storage characteristics so if i actually go and look at a virtual machine let's look at the dsv3 what we actually see is we can see those numbers so yeah i see cpu and memory blah blah i see temp storage performance so that's cached i see burst on the temporary storage but then i actually see uncached disk throughput ios per second what it can do now this actually supports bursting for its storage so i actually see a burst number so i see that it tells me how many data disks it actually supports being connected and then it tells me the expected network bandwidth so we get those characteristics as well now note to use a premium ssd i have to use the s variant of a virtual machine i the ds the es those are required to use a premium if i look at the non-s variant you'll notice it doesn't actually have um data disk iops and throughput because really it's just based on how many standard disks i connect to and you can think about what's kind of provisioned for the vm behind the scenes is based on a 500 iops disk so hey i think support was four disks it's kind of four times 500 so 2 000 iops there's not a limit because some of the standard hard disk drives that can actually go higher the standard ssds can go higher than 500 per disk so it it may actually be able to achieve those at higher numbers but it's not really provisioned for that so whether you can actually hit higher numbers there's no guarantee of that so that's why we don't see those numbers for the non s variance there's no kind of higher provision other than 500 iops per data disk that's really what it's based on for those disks so why this is important to understand the discs have a certain performance the vm has a certain performance so there's no good just throwing a ton of high performance disks the vm if these numbers are essentially bigger than what the vm can handle you have to understand both of those otherwise you'll get into this situation uh where the vm will io cap you or throughput and focusing on iops throughput will be the same so if i need a certain performance going through my solution i have to make sure the disks meet those numbers in terms of iops and throughput but i also have to make sure the vm meets those in terms of iops and throughput and just like disks generally the iops and throughput will go up the bigger the virtual machine now there are different types of vm there are some that are storage optimized there are some that compute memory gpus did you want to look at the different families of virtual machine i'd want to understand what are the overall characteristics how many cpus do i need how much memory how much network what is the iops and throughput i need and again what is that kind of bursting that may happen because maybe the bursting the vm might meet what i need but what you don't want to do is get into a situation where maybe the discs are just really really powerful but the vm's always going to cap you you're just wasting money on the performance of these discs i want them pretty balanced one will generally be bigger than the other a little bit it doesn't matter that's fine i try to make sure i can meet my actual requirements so make sure you consider the vm as well i really don't just want to get capped on the virtual machine that would be kind of a sad day so let's talk about that bucket so remember i said for the standard ssd and the premium ssd we get this free credit based bursting and it's kind of up to 30 minutes we saw that when we looked at the documentation of the disks so if we go back and actually look at the disks for a second if i actually look so what's this okay so this is the standard so this is the premium so remember premium said hey look we can burst up to a higher value even though my provision is 120 iops tiny all the way up to kind of 2300 they can all actually burst up to 3500 iops they can all burst up to 170 megabytes per second up to 30 minutes now even the standard ssd also hey we have these standard iops a 500 standard throughput and they can burst to slightly higher iops but the throughput is fairly significantly higher so they have a burst and once again that is 30 minutes as well so how how is this working so the whole idea of this really fundamentally is the idea that i have a bucket and this is kind of my burst bucket and it actually starts off full of kind of 30 minutes worth of bursting so that's the idea now i have my workload so i have my workload running and let's just say this is my disk this is one of my disks now that disk has is expected kind of performance line and then i have what i'm actually doing so let's say i'm hovering here if i go above that line all of this here is coming out so i'm coming out of the bucket so my line would be dropping i'm using that up when i go below my provisioned amount that amount is going in it's refilling the bucket so it really is like minutes on a phone if i use less hey i can kind of carry them over to the next month and do more talking that's really what's happening now there is an upper line so we have kind of that max burst iops remember this was kind of my provisioned or up to so they're kind of the numbers that we're seeing so we can think about hey when we go above our line if there's any burst left in our bucket i can go higher when i go below my provisioned i'm going to start filling it up now obviously the the bigger the disk the more filling i can do i if i'm a really small disc well this is this number is 120. so when i drop to idle i'm still only adding 120 iops per second whereas if i'm a much bigger disc maybe i'm adding 2300 iops when i'm idle you kind of get the idea that it's based on the size of the disk is how much i can fill my bucket so how much i can then burst to and we see that again when we look at the disks remember so if i look here at my kind of little p1 my provisioned iops is 120. so if i'm idling i can fill the bucket at 120 iops per second whereas if i was this nice p20 and i'm idling i can be adding 2300 iops per second now they can all burst to the same number but what this would essentially mean is if i emptied the bucket to refill it for my poor little p1 is going to take a long time whereas my p20 is going to start refilling that bucket for up for the 30 minute duration a lot lot quicker i probably had to refill the bucket or say in like 40 minutes whereas my poor little 120 trying to fill it bucket to 3 500 for 30 minutes i don't know that might take days my math is not that good but you see the idea it would obviously take longer to refill the bucket and what i really wanted to do again i'm saying iops throughput is exactly the same we have that same kind of bucket i want to show you this i want to actually show you this bursting so if we jump over for a second what i have is a virtual machine now on this virtual machine i have you can kind of see over here two disks now they are little discs little four gigabytes so they are these puny little p p1 120 iops 25 megabytes per second so they are very small now remember i can add and remove disks dynamically data disks i could just add a disk while the vm is running i don't have to shut it down to add and remove disks but this disk remember this this little four gigabyte disk over here but it actually can burst to 3500 iops now i'm running this on my dsv3 and it actually has an uncached iops without bursting of 3200 so keep those numbers in mind so let's look over here for a second so that actually finished so i ran for 30 minutes now what i want to show is let's scroll up to the first test i did for 10 minutes so let's go all the way up to the top and what i'm interested really is this iops per second here and what i can see from that is look at that number at the bottom 32 76.42 so i was actually doing 3276 iops on that little wedi disk that should only let me do 120 because i'm bursting that's exactly what i can see because i ran that for 10 seconds and then you saw i run it again for 33 minutes now we'll see something different the ips is still pretty high but it's a bit lower because i would have exhausted my bucket that's kind of what i was trying to show now if i rerun it right now for 10 seconds let's see what we get so i'm going to run it again just for 10 seconds now i think at this point the bucket might have started to refill but it will refill pretty slowly i'm not expecting to see that 3200 so 10 seconds might be too short it may have actually done something because i've been talking for so long but let's see what we actually get now so if we go up oh okay i'm still seeing pretty good eye ups because the bucket would have refilled for that limited amount of time but let's actually look at this then so obviously the bucket refills i'm talking way way longer than i thought i was as always let's look at that virtual machine for a second and what i want to do is show you some metrics what we're actually going to do right now is i'm going to jump over and we are going to go to our metrics and we can look at a number of different metrics so the first interesting metric is data disk target iops and we can see 120 that's my target yep that's what we kind of expected now i'm going to add another metric which is my data disk max burst iops also there's that 3.5 k then we're going to add another metric now i was doing all read operations so i'm going to look actually at data read operations per second and i want the maximum so there we can actually see hey i was doing and let's actually just look i don't want to look for 24 hours so you can see i ran this test before but i'm actually going to run it just for the last hour so you can see there was my test now notice right at the end see this little tiny bit at the end it dropped and it dropped because i would have exhausted my bucket at this point this was 30 minutes and my bucket would have got exhausted now we can actually see the bucket so if i add another chart and what i can actually see is my data disk burst i o credit percentage and again i want to look at the maximum and again we'll we're looking at last hour so you can kind of see here i was at zero percent of my bucket and then i steadily just used it all using it using it using it and i maxed it out and at the point i maxed it out it was kind of actually putting over here we can see the performance started to drop so we see here i'm actually my performance dropped and then you'll see it actually starts to go up again so we can see it refilling the bucket if i change my timing to 24 hours because i did run the test yesterday so i ran the test yesterday and you can see hey it maxed out and again the performance dropped and then over the course of basically 14 hours it was filling the bucket back up because the disc was idle and now i'm using zero percent of my bucket and then i use my bucket up again and now it's going to start to refill again so that's how i can actually see what the bucket is actually doing so it's actually really cool now the other thing i wanted to actually show you is when i'm looking at metrics now i had two data disks remember let's look at a new chart for a second and i'm just going to add in data disk let's just look at the read operations per second and i'll look at the max again now i had two data disks now i'm looking at this it's like well i had two of them i want to see the individual disks i can apply splitting when i apply splitting to this button right here i can split based on the lun check and now i see both my disks so i can see the individual numbers when i have individual disks so you can see here well yeah okay lun 0 is the one actually doing all the work lun1 is really not doing anything so you can go and get all of the different data so definitely take some time and again i'm showing iops but there are ones for throughput there's ones for the os there's a phenomenal amount of data available to me now one thing when you look at these remember sometimes i have to look inside the guest os like capacity well capacity is actually creating volumes and data inside the disk azure doesn't really know about that that's something the os knows about so i'd look at the guest properties to actually see the capacity used of a disk um so sometimes i have to go to different places um but that was it i mean that's kind of everything i wanted to cover hopefully it makes sense hopefully you kind of get the idea of hey there's different characteristics of the discs it's important i understand my workload the average the peaks what does it peak to how long does it peak for how often does it peak because maybe i can use the burst capabilities of the discs to handle that maybe i can't in which case maybe i use some of those changing the performance skew maybe i use on-demand bursting if it's ultra i could automate when it increases and decreases the iops and throughput to match when those workload or batch jobs run we looked at the different types of disks obviously we get higher performance we get higher cost we get lower latencies as we go up and maybe get some more flexibility in kind of the feature sets don't forget about the vm the vm has its limits as well make sure you're considering those when you plan the all up solution and the bucket that bucket for those data disks for the standard ssd and the premium ssd super powerful 30 minutes realize obviously the more powerful the disk when it's idling you refill the bucket quicker so that was it i really hope this was useful please please subscribe if this is it's really appreciated until next time take care you

Info

Channel: John Savill's Technical Training

Views: 16,877

Rating: undefined out of 5

Keywords: azure, azure cloud, microsoft azure, microsoft, cloud, storage, managed disk, disks, bursting, metrics, monitoring, performance

Id: 2nPZyLmciN4

Channel Id: undefined

Length: 57min 39sec (3459 seconds)

Published: Thu Jun 10 2021