Deep dive: Autoscale provisioned throughput for Azure Cosmos DB - Episode 25

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone welcome back azure cosmos tv live tv uh welcome uh deb good to have you back uh deb was the very first guest uh we had on this show um 25 episodes ago uh you led us off with uh our inaugural uh episode so great to have you back glad to be back thanks mark and uh new guest with us this week rav how are you hi i'm good mark how are you good so deb here is the pm side so aside from all the other stuff she does uh she's been the pm for all of our elasticity i guess you'd call is that the right word for it yeah the all the elasticity features which is just a fancy word for all the throughput uh that you get at cosmos tv and rav's over on the dev side of that so he's the developer uh counterpart uh and so this episode we wanted to talk about uh one of the very cool features uh for cosmos which is auto scale and for those of you those who have been using us for some time we didn't always have auto scale available in the service you would provision throughput what we call now standard throughput uh and it kind of worked like a like a vm i guess that's probably the easiest way to help people understand it i guess you would provision some amount and then whether you use it or not that was what you got that's what you paid for so you know you could provision a huge vm and play solitaire on it and pay a hundred or a thousand dollars a day for that and of course that would be a huge waste of money uh you could do the same kind of thing cosmos and sometimes people did they would over provision or they wouldn't have enough uh and so uh anyway that's where auto scale kind of came from right then but that was kind of to help with both those scenarios where people were either over provisioned and they didn't need all that or they were maybe under provisioned and gut would get rate limited right yeah exactly i think you touched on one of the biggest inputs of feedback into why you build auto scale in the first place which is that even though in cosmos db while customers appreciate that we give you super fine grained control of your ru's figuring out what to actually set for those rus and managing the capacity is definitely a challenge so with auto scale we've made it a lot easier by making uh i'm making it possible for you to let us cosmosdb handle the scaling and management of the capacity for you so the way autoscale works is that we automatically scale our use up and down based on your usage within a 10x range so how it works as from a user as you'll see in today's session and today's podcast is you'll set the highest ru's you ever want to scale to and then just start using your application start using cosmos db and depending on what you actually use that you actually need cosmcb will scale between the max are used and ten percent or one tenth of that range so for example if you said the highest r user everyone goes fifty thousand dollars maybe that that's a pretty high number for cosmos db by the way so maybe that's like you have a giant event where you know there's going to be a spike that might be your peak but you don't expect that for the rest of the year or even the rest of the day so you might set that as your max are used you might use that for a couple hours but then we'll scale you down to uh the absolute minimum of ten percent when nothing's in use so the way it ends up saving you a lot of money is you save uh you set what you want we scale you between that and when you're not using it instead of you having to worry about going back and changing that number we change it uh for you automatically and that also when you when you use the auto scale like i've seen auto scale and like like vms or other types of compute that but that our auto scale is instantaneous right i mean it as soon as you as soon as the request load goes up we're responding to it is that right exactly so the way it works is that when you tell us the macs are using everyone to go to that's a really clear signal to us that we need to always be prepared to handle that max are used at any time so we'll actually pre-provision all the compute and all the resources needed to guarantee that if you ever need to hit that maximum you'll be able to do so instantaneously so in that way unlike other types of auto scale where let's say you try to size up a vm that might take time right and then you can't really app can't really tolerate that downtime so with cosmos db because you've already told us the max you ever want uh that scaling within the max and the 10x range will be always instantaneous got it and then um it's not the same price though as the standard throughput correct you pay a little bit more for it don't you yeah you do pay a little bit more um and we can get into kind of when it makes sense to use auto scale when it doesn't uh partly is just due to the fact that because we have to provision you uh for the max but and let you use it if it's available right it's kind of like this uh standby highly available database service with all the slas for you uh so there's a bit more but we'll get into some examples later of when it makes sense to use autoscale and there are definitely times where it doesn't make sense and that's totally fine you should continue using manual throughput if that's the right um right setting for your workload yep that's cool yeah i mean i think uh helping people figure out when what's the right time to use it or when to use it uh it's great right make it easy for them to make a decision on that so yeah so do we want to get into some uh demos here rav how you how you feeling there absolutely absolutely yeah let's do some demo thanks for the overview mark and deborah so i i'm in the portal i have this my database so let's go ahead and create a container so for the demo i'm gonna start with initially uh provision collection and then let's see how the workload i will be simulating uh fairs against autoscale so i'm gonna pick manual uh 400 is fine let's go ahead and create it i always love watching stuff provision in the portal my favorite demo of all time okay looks like we finally have it here right so let's jump into visual studio now i'll give a bit of an overview what we're trying to do here um i'm using a dotnet sdk v3 for cosmos db i have about uh five threads running that we will use to simulate this workload uh another thing i want to point out here is uh in my cosmos client options i have set uh retries on rate limiting i've turned it off just for the demo sake if you have it set to something uh the sdk usually uh retrace it internally i don't want it yet so i've turned it off um over here uh so we're creating a bunch of threads uh trying to create a an item uh yeah so let's get started put a bunch of load on a container here right right right right so how much uh how much throughput you got on this thing this is standard throughput you created yeah this is currently a provision collection uh that we just created okay so let's see how it does let's blow stuff up i like this [Music] okay looks like you're getting some throttles so on an average we are seeing about 30 to 40 rides per second uh with a provision and uh these many throttles 140 or something so while this is running let's go back to portal now what i'm gonna show you uh how easy it is to switch modes between auto scale and manual so go scale settings pick auto scale you don't even have to pick a throughput let's see how things change when we hit save oh it's still updating oh look now we're getting more throughput here and not seeing any throttles this is cool and you did this all online no down time yep totally that's awesome so the scale range we have currently is let's see so we started with 400 provision and now we are we're running between 400 and 4000 ru range very cool so you just set the new max scale and then just submit it and then behind the scenes like magic we're going to set that for auto scale the workload uh in this case needed four thousand so it is since four thousand is available out of scale it it is consuming as much it can and no throttles now can i go back again oh definitely in a couple in a couple clicks you can go back um so play it manual go hit save oh so i noticed i can't set throughput again though right but once you're migrated you have total control though so once i get finished uh back to 4000 uh let's change it back to 400 to the way how we started it so you need to refresh it all right let's go back hit save and now i should start throwing 429s again yeah there they are they're back i love it and also with zero downtime so i can go and i can i can i can make a mess of my application uh and suffer zero down i mean i'm joking but i can go and do this zero downtime and i can throw all kinds of 429s again so this is really cool i like it how we kind of do it all seamlessly right there yeah and even from the surface area of the sdk it's pretty simple if you're used to uh creating provision containers uh so you might be familiar with creating a provision kind of like this uh you create throughput properties you say i want to create a manual throughput with 4000 and then you just pass into the create database uh method of the sdk and that's it so the only difference is you know if you want to create an auto scale you will you will call it this method instead create auto scale throughput now can i change back and forth like you were doing in the portal uh through sdk of today you cannot but uh it is is it is in the works uh currently the only modes where you can migrate between persian auto scale is uh you can use powershell you can use uh cli and you can use portal uh also yeah and uh azure management sdk too oh right yeah but this is definitely in the works uh having the support uh from sdk okay cool cool that's really awesome rav and i think one thing to also that's like really awesome about this is that workload actually needed 4 000 use right so as soon as you let it go to 4k it used it but later the workload becomes quiet and it goes back down to zero or 400 or 500 um instead of being billed for 4000 the whole time with manual right you would just we would just scale you right back down to 400 and you could just continue using that right yeah that's the benefit so you don't have to you don't have to change it obviously you can just leave it and we'll scale you down yeah that's cool yep so one thing i can show is in addition to uh you can kind of see from this console app that uh autoscale has kicked in and you're getting higher throughput and fee where 429s we can also show that you can also monitor this in the built-in cosmos db diagnostics logs and metrics so uh here let me just orient ourselves here i am in my the same cosmosdb account that rob was just using and i went to my logs and previously i've cons i've configured this account to automatically push some diagnostics logs into log analytics so i've already got that set up and here what that means is login links gives me a per second granularity into what's happening in my workload so typically we only turn this on when we really need to debug something but it's cool to be able to show the actual impact of auto scale on here so all that uh all the data for every request goes into this table called cdb data plane requests and all i'm doing here is filtering to my the database and collection that rav used filtering to this is status code 200. so these are requests that did not get throttled it actually succeeded and i'm just summing the number of successful requests by the time generated and this has been at the one second level so this is about when uh this is about when you know like we switched from the manual to auto scale and you can see the actual throughput increases or the actual uh requests uh successful request count increasing here so that's from the actual throughput of requests going through you can also monitor throttles as well you can look at them here if you don't want to use diagnostics logs and just want a more easier built-in no custom queries required you can actually go to the insights here so these are some built-in views and metrics that come out of the box no configuration required and here in this throughput and overview tab we can go to overview i've already filtered to the database that was created and we can see these are the throttled requests over time so the period where we were getting throttles because we're at 400 rus and then as rob showed it eventually went down to zero so two easy ways for you to monitor just how well auto scale is actually working for you from cosmos db and because this is in kind of the monitoring stuff uh you can create alerts on these things too if i'm not mistaken right yeah definitely you can one common thing customers like to do maybe a little less common now uh less with auto scale uh but still applies is uh customers like to monitor the total percentage rate of 429s in that they're seeing in their collection so typically um having a few 429s is not too bad so if you see one 429 don't freak out it actually means that you're using the ru's that you provisioned which is a good thing um typically we say that for production workloads between one to five percent of your requesting 429s is okay depending on the end to end you know user uh application that you might be running and your any slas you might have there if you're above five percent that's usually a pretty good sign that you actually do need more throughput in which case we'd either suggest you to if you're using auto scale let's say uh for that demo rob showed 4k was enough but let's say you're still getting a lot of throttles at 4k so maybe now you go to 6k or 8k right you can increase there or if you're using manual throughput you could also increase correspondingly so that's a very uh great callout mark there on common scenario that you can use these insights for yeah and uh and it makes total sense that why having some 40 29's is good right i mean you want to make sure you want to utilize the throughput you're paying for right so might as well just be right up right up on top of that total utilization and then uh we're going to do retries at the stk level and then you should i mean anything over a win should include some sort of uh poly library or some kind of transient fault handling uh and retry uh logic in there so customers should also build their own uh as well around that and and and they could use our sdk because we'll tell you when to retry in milliseconds right so they can advertise requests spread that out um and then you know there's some exception and i guess they can handle it that way but yeah makes sense you want to saturate your throughput i think is what we like to say exactly yeah very cool all right well anything else uh anything else you want to show yeah one other thing we wanted to talk about today is we get a super common question is now that i understand auto scale i see the benefits how do i know of my particular workload that i already have running in production today would benefit from auto scale before i just turn it on and see what happens right so there's a couple of ways uh you can either analyze this for yourself and we also have some built-in azure advisor recommendations as well uh where we'll basically tell you analyze your workload for the last seven days and tell you if you would save by using auto scale based on that uh that history there uh so i think it might yeah does it go up in the portal then for or in the azure cosmos uh blade or in the recommendations uh thing that pops up when you log in yeah so every time you know how you log into azure portal asks you have new recommendations uh so if you go there it's under cost and if you don't have a recommendation it means we analyze that auto scale uh that doesn't make sense you're fine with manual stay where you are but if we do find you could save significantly by moving to auto scale then we'll tell you and we'll also try to quantify the savings based on the seven day period as well oh that's cool so we're analyzing people's work their actual usage of of their database exactly the recommendation to turn it on and then how much to use as well so that's very cool do we also recommend the going the other way to provision that's a really good point we it's actually not hard to do that and that's been something that i wanted to do for a while uh it's much more rare i think it's like uh if you take 100 workloads uh i'd say 85 of them will benefit from moving to auto scale uh five of them will probably five to ten will benefit from staying where they are and then there's like five for using auto scale but maybe the workload is actually pretty steady so there's no real benefit to them for it uh but it's definitely something that we should look at and consider doing for sure yeah so one thing i do wanted to share more about is while we do have the auto scale advisor for uh to help customers it's also good to just understand kind of the intuition behind when to use auto scale and when to not so to that end i wanted to show some metrics that you can use uh today and then also just walk through like a couple of math examples just to give you some intuition of when it makes sense when it doesn't and you can apply it to your workload in the end yeah sounds good so in this tab here this is a different account that i ran another auto scale like workload previously except this time i pushed the scale up to 50k so just wanted to really show uh have a much higher throughput workload and simulate a workload that is quite spiky over time so here this is uh the metrics blade gear and the metrics blade think of it as anything you see in insights you can customize and add your own filters and build your own views here so what i've done is i've taken the normalize ru consumption metric the normalized ru consumption metric is a value between 0 and 100 that measures basically how much have you utilized your throughput so 100 is fully utilized zero percent is no utilization and i have it for this workload filtered to this collection so you can see for the past 24 hours it's covered at max between using around 50 of the throughput i've provisioned by the way i set this to auto scale 50k our use which is pretty uh pretty high throughput for uh for most workloads so uh don't look at this and think oh i need this ffdk on mine just this is really just push the scale um so uh so i can see uh there's many hours where i might just be idle or very low utilization five percent but then sometimes i do spike to around 55 percent maybe even 60 and a bit higher so one thing you can so the magic number for figuring out when it makes sense to use auto scale versus not auto scale is if i took the highest utilization in each hour and i averaged that percentage over a time period so typically we recommend averaging over a month which is 730 hours on average if i average all those numbers together if it's more than 66 that means that i'm actually using my throughput quite a bit i am consistently using the like 66 percent or more of the 50 000 are you supervision which means my workload is pretty stable and if it's pretty stable you can actually get a lot of benefit from moving to manual throughput but if that number is less than 66 that means that it's very variable which means that you'll benefit from auto scale because even if it costs a bit more within these peaks to pay for let's say uh half of 50 000 are used for the other hours where you have no usage you're basically paying for the minimum of five thousand dollars compared to keeping 5000 ruse for the whole month so one really easy way to do this within uh this view is once you create this view here so you just add your metric filter to the database and collection you want uh there's a couple steps first is here in chart settings we'll just set it uh we'll just make it a bar chart it's actually this is optional i just like it aesthetically because it's a bit easier to see and then under this time granularity we'll set this to an hour so auto scale will build will keep track of the rus that you actually needed for each hour and each hour you'll be billed for the highest ru's that we have to scale to in that hour so we'll change this granularity to one hour and we'll just do the last 24 hours but recommend you do at least two weeks or a month if you have that amount of data hit apply okay so now you can see it's now at a higher granularity each hour i now know exactly the utilization the highest point i had in an hour and now under here i can download to excel um i guess we could also go one by one and try to average from here but let's let excel do some work for us here just downloaded this spreadsheet here okay so this one i downloaded earlier today so same here's the one that just got in let me zoom in uh let me know mark when it looks like yeah that's readable scrolled in there sure okay so again just a bar chart version of what you saw in that ui and all we like to do is just go average okay and this one ends up being around 22 utilization and this is because i have a lot of hours where it's actually very low or quite idle so if this were my period my workload for the whole month and i expected to maybe one day spike to 50k rus i would just set auto scale and uh sit back and enjoy the savings if you uh one other thing is um with this workload uh if you look at this and say that you know consistently i'm only i've provisioned 50k auto scale macs are used but i'm consistently only using around half of them right that might be a sign that well maybe i don't really need 50k maybe i can still use autoscale and go down to like 45k or 40k or whatever uh whatever risk uh i'm going to tolerate for throttles uh in my workload there so basically you can use this get the numbers and crunch them and make a really an objective uh determination as to whether auto scale makes sense and also how much uh you should provision through there that's um i think it's this is great no i usually just use it for making pretty graphs but you're actually using it to make actual real money dollars and sense decisions here uh on which throughput and how much to use so yeah cool stuff all right anything else you want to talk about or show folks i'm gonna throw a url up here so i've thrown a couple up here if there's uh an intro uh where you can kind of learn about i think some of the stuff deb was just talking about so when would you use it uh kind of the percentage of load and when you would use it when it makes sense for money or for uh cost and then you had an faq that i put up earlier as well that's good lots of great questions in there and then this will help you get started on configuring uh creating and configuring your your containers or even your databases i guess uh for autoscale cool there's actually one more thing i want to show mark you've got a couple minutes yeah the one other the next most common question we get after should i use auto scale or not is how do i actually know what auto scale has scaled me to behind the scenes so there's another great another metric or another chart you can apply here let me just change this back to five minutes okay so there's another metric called provision throughput and the way it works is if you are using manual throughput provision throughput is exactly what you've provisioned if you're using autoscale provision throughput is what cosmos cv scaled you to as your workload was consuming request i was consuming our use so here i have the orange line here is my auto scale max throughput that's the history of what i provisioned as my max and then over time you can see the provision throughput change over time so you can see in the hours where i actually spiked my normalized utilization i'm hitting between around let me scroll down so you can see these numbers i'm scaling to between 17k 20k ish rus and then in the hours where there's no usage or very low usage i'm just scaled at the minimum of 5k our use so we can also see from this graph if you again see that you're rarely using the actual max that can also be assigned keep auto scale but lower than max are used if you don't actually need that many argues yeah that's yeah perfect that's a good graph actually to keep yeah people should run that one definitely all right uh you got any other goodies to show me as to see if the audience had any uh uh no questions as of yet so uh maybe people are just kind of all kicking into the portal and giving this thing a try i think so our deborah did a really good job at explaining things i think and you know what i love these graphs uh this is really straightforward once you you know where to look uh use azure monitor look at your actual usage in there and see because is obviously the right setting for me the right throughput choice for me and then how much do i want to use so um yeah i love this stuff okay well thank you very very much for joining me this week this is great uh yeah if people have a question they can ask in the chat or they can hit us up on twitter uh just some uh announcements uh for you all uh here so whoops what happened uh we've got a new cosmos tv user group uh just starting in september september 22nd uh will be our very first virtual user group meeting uh so come uh that link will take you to meetup where you can join in rsvp for our first user group uh first one's gonna include me i'm gonna talk about modeling and partitioning in cosmos db uh we'll have a couple of these uh really fancy uh coasters uh that you can maybe win uh with a drawing uh so 2 p.m pacific time wednesday september 22nd uh that's 10 p.m london time and 7 a.m sydney uh oops that's the wrong one here's the one i want to show you is go to this one here aka ms uh cosmos db user group uh also uh coming up in a couple of weeks is azure serverless conf uh so uh september 29th through the 30th we'll have three live streams uh one for the americas one in asia pacific and one in emea over those two days as well as a whole bunch of on-demand sessions as well all about how to build event driven apps with azure functions serverless cosmos db serverless uh sql serverless and logic apps so come check that out uh that'll be next week uh also if you have any show ideas just feel to hit us up on twitter at azure cosmos db uh next week i've got uh another new guest uh theo vancray uh theo's not been on the show before uh and theo manages all of our cassandra api uh features and he's going to come on he's going to talk about a relatively new offering uh it's managed managed instance for apache cassandra uh just rolls off the tongue uh and uh we'll talk about why we built this offering and how to choose between this uh and our fully managed cassandra api uh so if you're running cassandra on-prem or maybe in a vm uh this is definitely an episode you'll wanna you're gonna wanna check out uh this is pretty cool uh the stuff we've built uh and we're offering to customers so come check us out uh for that next week so deborah rav thank you so much uh it was great having you on the show this week and uh deb i think you're coming back in a few weeks if i'm not mistaken so um what do we oh yeah hierarchical partitioning yeah got a lot of uh the team that rava's on the lessons team uh is making some really great uh future improvements and just new features for this year so i'm really excited for that i'm super excited this is one of my favorite things uh that you guys well i mean pretty much everything you work on is awesome uh when you're talking about all the elasticity stuff but the hierarchical hierarchical partitioning that's also a tricky one to say that's also very cool so definitely looking forward to that um okay well thank you folks for joining us and uh we'll see you all next week deb and rav thank you very much thanks for having me okay bye [Music] [Music] you

Info

Channel: Azure Cosmos DB

Views: 262

Rating: undefined out of 5

Keywords:

Id: nr6fnRrC0YM

Channel Id: undefined

Length: 31min 52sec (1912 seconds)

Published: Fri Sep 17 2021