hey everyone welcome to this video all about Asha cost optimization I wrote a blog article a couple of weeks ago and I said I'd create a video summarizing called the key points and it's actually taking me a while to work out what was the best way to actually create that video if it should be kind of whiteboarding or all powerpoint I kind of settled first kind of something in the middle where I'll kind of walk through and the PowerPoint I create and really it's all about these are challenging times but even in the best of times we don't want to waste money after all we like money we want to spend our money the right way so what would I look here are the things we think about the things we can optimize how we can understand what we're spending when we leverage Asia and so we're gonna walk through some of the cloud cost basics how do we get billed for things in the cloud or unlocker viewing my spend understanding accountabilities that's really a key thing many of us on-premises remember kind of VM sprawl and we had hypervisors people created virtual machines and we completely had no idea who owned a certain VM so you could never delete it maybe it's running some mission-critical process most funny was doing nothing but we couldn't risk it we didn't know who used it so want to make sure we have good accountability in the cloud free stuff there is free stuff in Azure and so we want to take advantage of the free stuff before we start paying for things helping with migration we have existing workflows want to take them to the cloud what are the tools available to help me do that and right sizing is critical again often on-premises we would over provision I don't want to take that to the cloud so what can we do to understand what I actually need for a certain resource before I move it now we're gonna carry on optimizing for at the entire life of a resource what can I do to right-size it before I start a movement a few benefits reserved instances a hybrid benefit again helping me save money then optimizing certain services compute storage databases network what are some of the things to consider and then as your advisor is my friend I just want to talk a bit about as your advisor and just how we should be using that kind of constantly there's a lot to cover but I don't this to be too long and to me these days too long will be more than an hour so I'm going to try go through this pretty quick and I'll maybe try and link to other videos or answer questions in the comments below before we get started if this is useful please kind of give it a thumbs up please subscribe comment etc I want to make you sure this is totally clear optimization is not something we do once it's not something that we do at the start of project and then we kind of walk away it is an ongoing effort yes before we deploy before we architect we think about these things we learn what's available in Asia then we design our service meeting the requirements in the most optimal way then when it's running we might tune and tweak things we stay up to date with all the services that are out there because maybe there's something new maybe there's a new option for the service we use maybe there's a completely new service that enables us to change so this is an ongoing effort this is not a one-time thing I'm going to repeat this a few times throughout this video so cloud cost basics fundamentally in the cloud its consumption based I pay for what I use now on-premises we provision based on the future worst case scenario I remember when I was a consultant and I'd be designing an exchange deployment we would think about well how much mail do you think you're gonna have in five years time if you grow this much etc and that's what we would buy that is the server we would purchase because it had to last that long so whatever the worst or maybe best-case scenario was in terms of how much we've grown would have to size accordingly now virtual machines have helped with that but still on premises we tend to blow and on premises we tend to scale up if something gets busier we make it bigger will shut down the VM or add more CPUs will add more memory then we start it back up again so we scale up words we make things bigger we don't think about that in the play out we want to scale out but I'll kind of come back to that so in this cloud we can scale up we can make things bigger just like on Prem we can add CPUs we have memory by restarting the virtual machine but we prefer to scale out and in it's a lot more dynamic we think about having more instances of something so we're gonna scale our scale in and if I have five instances we pay for five if I have three we pay for three and that's kind of a better option so let's kind of just think about those so if I scale up I make something bigger so here I've got my my little virtual machine is doing his job I got one virtual machine doing that job now if I get busier I've got more load then yes I can make great big powerful bigger virtual machine so the here is I'm increasing my scale absolutely I can process more it's got more CPU more memory maybe more I ops more capacity I can do more so I can take more requests so I'm meeting the requirement in terms of scale but on the downside my cons will it likely requires a restart to do that resize operation there are also limits to how much resources like a single instance of an app will normally be able to take there's an upper limit of how many CPUs of how much memory of how much I opt the app can actually make useful use of it's also doing nothing for my resiliency this great big powerful guy gets sick if he has to have maintenance of restart my entire service is down so yes I'm increasing my scale but not increasing my resiliency whereas in the cloud on a scale horizontal got this same guy but now my workload is increasing so here I can add more instances so I'm paying for more instances in the same way I was paying for that bigger virtual machine before if my workload decreases well I can shut some of them down so the pro here is I'm still getting that increased scale as I need to I add more instances I increase the scale but I'm also increasing the redundancy the resiliency of my service because now I have to 3/4 spread over different racks maybe different data centers for availability zones maybe I'm even scaling across multiple regions I'm active active so not only am i increasing scale I'm increasing my resiliency and that's really kind of an important point there's no downtime I'm not adding CPU or memory to a single instance which requires it to restart I just increase the number of instances they might then go on to a low balancer which will then start sending requests to them so it's very dynamic I'm only paying for what I need and it's zero downtime now the cons are oh I might require a load balancer depending on the app the architecture and I might have to pay for that for one of the standard load balance for across wait ability zones there's a charge for that my app may have to be tweaked today my application might be kind of single instance it doesn't work with multiple instances so it may have to actually tweak my application to work multi instance but it's something that's worth doing this is definitely the future so in the cloud this is the winner I want to scale out I really want that capability net that's key when I started thinking about cloud workloads if I think that single instance yes I can run it I can put it on a premium SSD I can still get an SLA of three nines SLA but when I start getting that the higher SLA is where I start getting zero downtime zero downtime maintenance if there's an unplanned failure I'm still up from running cuz I'm resilient to a rack or even datacenter failure I need more than one instance so I definitely want to be thinking that direction so to give me that ability to dynamically scale now when I think about planning for costs there are often several ways to meet requirements oh I've got some requirement make sure I understand all of the options I could use from a compute from a storage from a networking perspective make sure I understand high availability requirements disaster recovery requirements retention requirements either understand all of these things to help me craft the right service and maybe there's a SAS offering out there that can just do this thing so think about the business requirements first then I think about putting those into technical requirements and I think about an architecture understand the various cost components for any part of the solution again when I architecting the cloud I'm probably bringing together multiple things so understand the options across all of those different components and then we I'm gonna bring them together to create the service if I want to understand what Saints gonna cost me there's an azure pricing calculator this lets me go in and say well I'm gonna use this component I have this many running for this many hours or this much capacity or this much I ops I can plan in different types of cost-saving things like reserved instances like hybrid benefit like maybe a discount I have as an organization I can export this out as a CSV file so this is a good thing if I want to understand what my pricing is going to be I do this I go and look at the pricing calculator make sure you don't forget about things like backup and disaster recovery and monitoring is huge if it's is what's it going to cost me to patch the OS to patch the runtime to monitor it is the OS healthy is the app healthy I need to have insight into all those different things because a database after tuning and my index is healthy on my partitions correct am i sharding it the right way is it all work I have to do is that stuff I'm doing on my own or is there maybe some intelligence to help me with those things hadr cos i talked about and then there's the service management so the thing is I'm doing to manage underlying layers maybe like runtimes of middleware but when I started coming into platform as a service offerings I'm managing a lot less than if it's an iOS so is would be a virtual machine with a virtual machine I'm worrying about the operating system the runtime any middleware I have on that and I'm worried about the app and the data if it's platform as a service I'm just worried about the app in the data that's kind of a key point in the cloud we never really worry about the physical fabric I'll often to all kind of the layers but the compute the stories the networking the hypervisor I don't care about those things that's ashes responsibility the lowest level I care about is your brain system and the tasks associated with that operating system and there's lots more I mean there's many other things when I think about costs the point I want to get across here though you make sure you consider there's a complete solution yes the azure pricing is a big part of that but realize there's other costs as well there's humans to go and manage these various components there might be licenses for software you use for patching or for backup or whatever that might be so that's why if I'm gets us or pairs and SAS offerings it's not just a component that hold pricing model shifts completely some they're not worried about the operating system I'm gonna even worry about runtimes and middlewares anymore there's a lot less for me to focus on and really as a business I care about the app the app provides the business value very rarely does an operating system or runtime provide business value as a company we want to care about what differentiates us as an organization from our competitors and that's gonna be our app investments so the more I can focus on that and let someone else worry about the stuff that makes it run the better so viewing my Asia spend as your cost management is phenomenal it is really evolved after the last year there was an acquisition a lot of the functionality moved into a cost management so it gives me a number of cost analysis capabilities now the primary one is this cost analysis capability that's gonna give me insight into the cost across my subscriptions different resource groups I can add filters maybe based on the type of service the region tag tax huge you're gonna hear me talk about tags a few times as key value metadata I can apply to my resources different date ranges different types of you I can really pivot and dive into the data so I'm just curious about hey Lord what am i spin and maybe where's the trend going the adder cost analysis capability is phenomenal so we definitely want to go and take a look like go into your subscription go and look at the management and just have a look around familiarize yourself with the capabilities with the kind of dive into the day to I can do with the different filters and the groupings it makes it easy for me to quickly see well okay I'm spending this amount of money why am i spending this amount of money what am i spending it on again I can tackle trillion to their data to find out the root of the costs it's also billing API so I can write my own stuff to go and pull that usage data and maybe can pull it into a different analysis tool they're actually things like power bi tools I can leverage to go and do my own analysis of the data now if I'm an enterprise customer so go to Enterprise Agreement I can actually obtain an API key so that API key then lets me pull all the enterprise level data again so I could do my own analysis maybe relate to the entire enrollment so this is great to go and see well what am i spending I've spent this money I'm trending a certain way why what was it on I also do budgets a budgets are great because budgets let me actually control maybe but they let me let me set actions at certain thresholds of a budget now most commonly that budget is gonna be dollar based but I also can use metrics and I can do it different scopes so I could set a budget for example at a management group a resource group at a subscription and then what I can do is yes I can set the budget to say hey your body is $1,000 you may have heard of action groups its action groups of this common technology throughout Asia we originally saw them with kind of alerting so in agile monitor I can have alerts hey my CPU is cost a certain threshold and call this action group and just like the name suggests an action group has a number of actions tied into it that could be send an email it could be sent an SMS message it could be call this webpop which caused some restful process could we raise a service ticket I can use the same action groups as part of my budget thresholds so I could say hey look when I hit 70% of my budget call this action group maybe that action group just goes and sends an email to the owner of whatever scope be a resource group a subscription management group I said just hey you you're 80% and here's a report of what you spent so far may be at 90% it sends it in a bolder font in red say hey stop spending me money hundred percent it sends a web hook to some service ticketing system it says this big burly security guards going like bottom on the head we're saying say stop spending money whatever that might be we simply don't like to also shut down things it is a production service and it's spending more it might because business is great we have lots of people using our service I don't want to go and shut it down test it maybe production probably not but I still want to know I want to make sure there's not some huge unplanned bill at the end of the month so we can do various types of communication notifications escalations whatever is right for our service but budgets let me do that all these different action group actions to do different things so I can call those action groups at those thresholds and again be careful about just stopping stuff free stuff if I likes things for free now sure there are a number of aspects that are free now they were also arbors they give you a certain amount REE and then I start paying for it now I'm going to just cover a few of them pretty quickly here someone though favors is the right word but there are many others you should go and look at the documentation and that actually goes through all of the things you get free and I'm not talking about a free account I can go and sign up for like a 12 month getting started I get certain services for free for that year or maybe I get $10 credit these are things that are kind of always free and also if I have MSDN you get a certain amount of as your credit as part of your MSDN use that it's your personal sandbox so you can go and learn so as your policy is free actual policy is how I'm going to think about applying guardrails to my environment things like DevOps and the new process we use to deploy I don't have someone manually looking at requests and doing the deployment of checking some policy gives me those guardrails as your active directory is the core identity of everything there's a free SKU cost management and budgets are free network ingress stuff coming into Azure is free aks cluster management is free there's basic security information with more more being rolled into there including identity recommendations are free or both access control resource group management groups are free Asha advisor is free and there's many other things I get five gigabytes of network egress and then I pay for the rest I can do 1 million function executions and I use this to run like power show in the cloud doesn't cost me anything I can pick one cosmos DB account per subscription to have this kind of initial free tier of 400 request units per second if I go beyond beyond 400 not a for what goes above that 10 app services for free 50000 b2c authentications that's how I can think about if I have a customer facing app I don't pay for the user accounts anymore just pay for those indications I pay for the MFA interactions well 50 files are just regular indications I get free so chemic kind of checking out the documentation I'm linking it to it in the kind of youtube description go and see all of those different things so let's talk about the more common so okay so I've got past the free staff and using my resources what can I do to optimize that cost so as your reservations they used to be called reserved instances that's when it was just CPUs now it's moved beyond that there are other types of services I could do this reservation for and I kind of think about this is imagine your pre booking a hotel if I just show up at the hotel say hey I want two nights I pay a certain amount of money because the hotel and attempt to kind of react to that it has to expect certain rooms to be empty so it has to up charge the normal rooms to make sure it's meeting its bills it can still make a profit whereas if I know hey I'm gonna be there for two months and I pre booked it I get a cheaper rate and then the further I booked that for three months six months I get a cheaper and cheaper rate there maybe for a cup of those two three nights out of that three months I'm not going to be in that hotel room maybe I'm popping home or something on a long business trip but it's still cheaper to just reserve that room for three months so we did the same idea for the Asha reservations so what I'm going to do is many types of Asha of resource I can pre commit for a one or three year term now obviously I'm going to get a bigger discount if I do three years compared to one year and I get pretty big discounts on these things exact discount varies but it's a big number now many types of service can be reserved so it's not just compute there are things around cosmos and storage different types of resource I'm committing hey I'm gonna use this for next one year or for the next three years so I'm gonna pay less money for it then you have to balance this because again if I'm not using it I'm still paying for it I've pre-booked that room for three years but not in that room for a couple of weeks and I'm still paying for it so when I think about resources I would use the reserved instances if I know I'm gonna need that that's what kind of my my base level I'm always a need that amount a resource for next three years I might need more resource but maybe those bursting amounts remember I'm scaling horizontally that I'll pay the regular rate for it's not economical to pay that reserved instance price because it would be not used for too much time so there's a balancing act and you can you can tweak these as you kind of go on now for compute originally I had to buy it for a certain size so I'd buy an exact SKU of virtual machine now it's based around a vm group so I buy cause of a certain group for example dv3 and I could break that up into ones with two virtual CPUs or eight virtual CPUs I just pay for a hundred reserved instances for three years of DV 3 and then in taupe like use a hundred virtual CPU cause of that type I don't have to do anything this is a billing mechanism essentially is billing energy wakes up each hour looks at my reserved instances oh you've got a hundred dv3 virtual CPUs what's running right now the first hundred I find I'm going to apply that discount to but that's all it is it's not named instances it's totally flexible just gonna look at what how can I use that every hour is gonna wake up and do that for you and it wakes up it is just a billing engine so I don't have to do anything so as an example this is that I've got four apps and they're all of these threes just in this instance D twos d2 v3 you can see hey and in this case I just bought two cores so this is a two core virtual machine I have two reserved instance cause it's not a real hundred more than that blessed the sake of simplicity this case I can see well at one is running when it's a darker color it's running so the billing engine wakes up and applies that discount to my first app when an hour for a second app starts but I only own two cause of reserved instance so we can't apply it to the second application well then the first app shuts down then it's going to apply that reserved instance price to the second app then nothing's running but I'm still paying for that reserved instance call for that hour then app 3 wakes up then at full wakes up you can see the reserved instance just applies wherever it can so all this is I think about what is my base workload or maybe even bit more than my base if it still works out cheaper even if it's not running for temps at the time if the discount was 50% that's still worth it so I think about that math when I think about what is the right reserved instance for me now you can switch between p.m. groups there was a mechanism to do that switch so hey I don't want to use dv3 anymore or switching over to eseries or something else there was a mechanism to do that I can I'm stopped early if I need to but there is a penalty so see I was getting a discount based on a certain period of time so there's a penalty I have to pay for one of its kind of cancel early but if I know I have a certain base usage those served instance makes a huge amount of sense especially when I just think man if I go to that three-year I get a huge saving on that but even one year as two will get big savings and you can go and look at pricing and it shows you in the pricing calculator what the percentage savings will be for reserved instances so next we think about hybrid benefit now the point of hybrid benefit is well I've got on premises licenses today I've got Software Assurance and it's gonna let me take that license and leverage it in Azure now if it's Windows Server if it's standard for every two processor or 16 core license I can basically use that Windows license on to court or one 16-core instance now it could be a smaller instance like a fork or but it's didn't only two instances I couldn't do eight one cause for example to eight cores or smaller or 116 core so I'm taking that license and instead of using it on Prem we're going to use it in a clamp now if I have datacenter well it's the same as above in terms of processor two and core mapping but it's simultaneous I can still use the license on Prem and use it in the cloud so then basically what that means is for my Windows virtual machines I pay pretty much the same as an expert to machine I'm not paying for the windows license cost any more per sequel server there's standard cause I buy a sequel server or Enterprise cause so basically what I get is one cloud core for standard for everyone on premises core and that Maps the kind of general purpose or hyper scale or sequel standard in is VM it is an enterprise core then I get for cloud cause for one on-premises core is its general purpose hyper scale or I get one cloud core if it's business critical or sequel enterprising is VM but that's not simultaneous I can't still use it on Prem I am moving that to Asia so now I can get that kind of Ritchie per thing and the key point here is as your reservations love library benefit you put those two things together and it's crazy kind of the discount I can get on my edge of resources so if you if you can and take advantage of that let's talk about migrating so I've existing workloads and now think about moving them to the cloud make sure you really understand the true requirements how many times I've gotten looked at an application and it's deployed a certain way and you ask people wire them ago because it was like that when we go out people don't understand the business requirements the technical requirements take some time to understand that take some time to understand how it actually runs what is the performance of that application are there peak times are those peak times hourly daily monthly annually for data clean it up before the migration I may have a whole bunch of data I read them need any more it's there because I really can't be bothered to do anything with it do I want to not only pay for it in the cloud because in the cloud I pay for consumption I pay for the storage I'm using but remember I have to get this into the cloud so it's gonna be ingress is free it's not it's costing me money it's gonna cost me time and appending on the type of data does the system have to be down while I'm copying it from on-premises into the cloud that's the case the needs of the smaller the data the better that's gonna be for me so if I can let's try and clean up the data before I think about actually pushing that into the cloud now there's as you've migrated actually migrate is a tall you can leverage it's going to help me of that assessment you will actually go and look at the resources look at the configuration work out would it actually be compatible with the cloud it will go and look at its execution what resources is actually using work out dependencies again when I stop moving things to the cloud it's no good moving half of it to the cloud and leaving half of it on premises if it's chatty my performance the latency between saying on Prem and the cloud even if I have a fantastic it's fresher out connection is gonna be a multitude bigger than if it's sitting the same dentists in the same data center and same day it sent a sub millisecond most likely data center Asher maybe's 20 milliseconds it depends which have been much bigger latency so if I took the database here and moved it to clouds well that's the app on Prem bring may not work very well it may be fine but you need to understand that I have to understand the dependencies and likewise through a dependency what does it need to function I don't want to take a cloud service and have a dependency on saying on-prem so now if I'm Prem fails and my network connection fails the cloud service doesn't run anymore understand all those components because ideally I kind of want to go to run if I move it from on Prem to the cloud I don't want to rely on something on Prem because then I'm very dependent on that on Prem are dependent on the network connection and there are tools to actually help me migrate the data both for VMS at whose address type of coverage of replicate and then fell over for databases there at also helping migrate try to see what databases a lot of cool stuff to help me with that accountability and governance I mentioned before BMS walk so many companies struggled with VM school hey we have this hypervisor we don't have to buy servers for project two anymore it's really easy to create VL maybe we had cloud on-prem everyone created virtual machines but there wasn't it wasn't a true cloud I wasn't tracking who owned things and so all these VM sprung up servers filled up so we had to buy a new server and a new server and another sand and we had no clue who owned what but we didn't know what they did so we couldn't shut them down we don't want to get into that state so it is critical that every resource we have in Asia has an identifiable origin who created it for what app for what project for what cost Center I want to be able to track that for every resource so then I can go to someone to some project to some cost center maybe I'm not doing charge back but still want to know what the cost center what the business line is using what they're costing the company make them accountable for that to show what business value Hey look you're spending $2,000 mark on Azure what is the business value we're getting from this so I have to be able to have that accountability now a logical structure but I think about management groups and subscriptions and resource groups is going to help hugely with that so you want to make sure before I start deploying anything in Asia we get governance first we get identity setup we government set up governance in terms of management group and subscription and resource group structure in terms of role based access control in terms of policies so I'm putting that core scaffolding in place to enforce my governance where I kind of public eye piece what regions I can use what resiliency I need budgets how much I can spend but accountability as well I want to be able to track who is using who created these resources tagging is our friend tagging a key value pairs cost center call center number creator they created whatever that might be use tags and the cool thing is I can use a sure policy remember that was free that's one of our free things I can attach a policy to require certain text to be present or the creation the resource will fail see if I'm using a and a DevOps pipeline or github actions it won't let it get created unless I got these tags in place or tags are not inherited from the parent resource group every resource lives in a resource group I can put tags on resource groups I bought taxon subscriptions tags on everything measure groups but I can attach a policy to say look if this policy if this tag is not present copy the value from the resource group and put it on this resource for the tag so I can essentially inherit from my resource group via add a policy and tags are great for that identification they're great for searching I can go and quickly find things from a billing perspective I can pivot on tag privacy well what is this tag this particular a particular project what is it costing me and I can quickly go and find out well who created that thing make sure you were signing out of policy at the right scope when I think about policy remember I can do things like have that core scaffolding here I can't create public eye peas in a set for these subnets I can't use the more expensive SKUs I must have gos replication on my storage account if it's Ted there if it's the test I pretty don't care about GRS if it's prod I probably do want to use the more expensive virtual machines so to make sure that the top levels of Management Group structure will be more broad in our governance and really capture the key things we must do as we get closer down close the resources will get more specific so make sure when we think about policy and all those things we do it at the right scope and just as a kind of super quick reminder of that thing when I think about resource organization remember we have our active directory we have our tenant we have our root management group then we can have a whole hierarchy of management groups underneath that then we have subscriptions under the management groups then we have resource groups in our subscription and then we have resources in the resource groups that's our structure and we have policy role based access control and we have budgets I can apply to all of them any management Boop and they get inherited subscription again Heretic resource group they get inherited and yes I can also do it two resources we don't want to do that we should not be assigning our back for example directly to resources yes we can we should be doing that automations might do our back at a resource level us as humans resource group is that lowest we should ever be going resource groups bring together components that share a common lifecycle they're getting created together they're running together d provisioning together so probably were born common permissions common policy common budget we don't need to go any lower than that but yes technically I can do our back at a resource level this was all setting things up and these will critical for cost management I have to know all these things and again when you think about policy policy could help me stop spending if I make sure I don't use those big VMs I'm gonna save money is they make sure I have a certain budget it's gonna help me control my spend but I do need to deploy stuff then when I think about optimizing my service use I have to obviously think about what how do I actually do that in my environment so I want to look at some considerations across a range of services now for everything any requirement if there's a SAS option generally that's fantastic we'll try and do that the further I go from iers to pass to SAS the less I'm responsible for the less kind of human costs I'm gonna have to run those things that stalling I'm gonna require generally will save me money but I can't do that for everything I'm doing a custom app I thought he can't do SAS but for compute we're going to try and push us far too kind of the path as we can so we have virtual machines that as I as as a virtual machine individual resources we have virtual machines scale sets pay a certain amount of automation deploying them based on some template I can do containers then containers of orchestration with aks I've service place and I have service so it'd be trying to go as far that away as I can if I can you serve this phenomenal let's use service but can't you serve this kind of do app service plans or aks with containers if I can't okay what can I do VM scale says and then if I can't do it begin I love bm's I'm an infrastructure person but I'm still gonna try and push the otherwise I can says less for me to manage I'm not patching operating systems or middlewares we're about antivirus or firewall Zoar runtimes I want to minimize that stuff I want to focus on the app that's bringing value to the organization and for database is it's kind of the same thing yes I can install at any database I want in an is virtual machine but if I can use actually sequel database great it's an evergreen service it's got some certain masses of high vulnerability and backup an optional D artist be all in I don't whine about upgrading sequel versions we're gonna get some technical debt that I have today on-premises so I'm running a twenty twenty ten year old version of a sequel server if it's an open-source version maybe it's Postgres maybe it's my sequel well there's a an ashen managed version of that maria DB as well it's gonna do those minor updates for me it's got so many backup built in for me so if i can use that as better than me just to point my own to an iOS virtual machine if it's no sequel its cosmos DB so always make sure you understand what's there and again i'm trying to go as far away from me installing thinking of vm as i can i said at the start I'm sending it again it's midway through optimization is not a one-time activity we're optimizing constantly we're designing when we're deploying when it's running with constantly re-evaluating okay for all my workloads be it compute storage network I have to understand the usage pan I have to understand the kind of the average run rate I have to understand the peaks and the travail ssin of those Peaks the frequency of those Peaks is yet a 30-minute window every day domain controllers boot store is it a five hour window once under here I'm a pizza restaurant and it's Super Bowl is it a few days of the year maybe a retailer it's back Friday I'm a telco and the new iPhone is coming out maybe it's a couple of months I'm a tax prep shop and its leading up to tax day understand what they are because it's really important because when I'm sizing I don't want to just always have running for that worst-case scenario but I need to make sure I consider it in my architecture so I can burst up as I need to and depending on the frequency what that peak is my architect that bursts in different ways because we example I talked about it's kind of scaling out and that's great but for a domain controller do I want to add for domain controllers for half now and then delete them again probably not is there maybe another way I can burst for a 30 minute window so virtual machines there are a huge range of virtual machines types and sizes there ones around memory optimized computer optimized storage optimized ones of GPUs once with high performers computing with our DMA network connections take time to understand what your workload is more skewed towards is it skewed towards storage or CPU or memories image is general and then find the the tier the type that is the best fit I mean wants to one size number CPUs memory i/o ops through port number date disks I can attach networks through port look at all of those things and we pick the one that's the closest match to the model we have all of different types of resource now some of them do our bursting capabilities the b-series converse the cpu just like a cell phone plan where I can roll over my minutes the b-series I get a certain percentage of the CPU maybe it's temps in if I use less than temps in I can start a crew and critic I can in bursts if I have a crude critic and I need to exceed the regular 10% I've got to 100% based on the amount of credit I have accrued so in that domain controller well I have a 30 minute storm a day but normally I'm pretty idle fantastic I don't need more domain controllers I'll just burst up at those times the lsv 2 now supports uncashed bursting around i opsin megabytes per second the same idea gives a certain amount of I ops and throughput and then if I'm using less than there I can start banking credit for I ops and through port and then I can burst up if I actually need to so I can use more than that for apt I think so at 30 minutes so I can far exceed what I actually have premium SSDs p20 and smaller I can burst the performance of those as well so as a certain steady state I can burst beyond that and all of these things are desired to say look instead of buying a bigger VM or bigger disk for some fairly rare bursts you know what I'll let you have the burst as your files premium I can burst a performance there as well so look up what is my requirement what is the steady-state what is my burst there might be other ways to solve that now I can scale up I can change the type I can change the size it's gonna be a restart I can't dynamically add props or memory even if you let me at the VM level there are very few applications that will go and recognize you've added CP or memory and do st. useful with it so if I resize is going to be a restart that's why I prefer to scale in and out that's dynamic and it increases that resiliency as well remember even for virtual machines I really want at least two yes I can do one and I get my 99 my three nines SLA but ideally we want to spread over I have an availability set ie different racks in the same data center or availability zones different data centers in the same region and there maybe I need any dr as well remember to shut them down so yes size them correctly optimize my spend but shut them down when you're not using it and when I say shut down I mean deep provisioning I'm going into portal and shutting down I'm using rest api's or PowerShell module or the AZ CLI I'm not doing a shutdown in the guest operating system for do it start shutdown it's still running from a Azure compute perspective is still provisioned on the fabric I'm still paying for the computer and it D provision it from the fabric and then I stopped paying for the compute I still pay for the storage it can't delete the storage I'd lose my state but I can stop paying for the computer aspect of the virtual machines and there's an auto shutdown capability for virtual machines it uses dev test labs behind the scenes so I can say hey at 6 o'clock this is my dev environment shut it down every night I could write a Nashville automation I could write an Asha function and have a schedule trigger to shut down my workload at weekends and at night and I know I don't need them I could be mindful and say hey I'm done with this I'm gonna shut that thing down if I do create a bunch of workloads in it's a test make sure you delete them when you've finished and put put test workloads in their own resource group again that they function together so we just delete the resource group when I'm done with that test save me money now make sure you do delete everything for virtual machine there's the virtual machine do I'm paying for that it's dollars are the OS disk I'm paying for that I have data disks maybe I'd be paying for those and I pay for the network adapter maybe it's got public IP address I'm paying for that most likely I get cheaper price for I think the first five in the arm model but then I pay a bit more in classic we got five for free we don't in our sum I'm paying for more than just the OS if I just delete the OS but leave the disks behind if they were like a premium SSD we're still playing quite a lot money clean those things up use good naming it's right now is I don't like the portal with the portal also name stuff and I may be out of work out what's associated with what if we use template I name everything so I have a good naming standard as part my governance and maybe it contains the the region the views there's there's all guidance around these things but maybe I've got a consistent start all of my naming of all my resources that's the VM and then it's data disk one or public IP or NIC etc so in easily associate what belongs to what resource I don't have things left around make sure everything gets deleted when I'm done don't pay for stuff that's doing me no benefit now there's a little bit of help I wrote a script I posted it to github that will go and search for unused disks either not mapped to a VM it will go and search for public IPS aren't used also as a resource graph is much faster than my script and it's almost near instant will search across all your subscriptions as there's my script they we'll go and find any disks that aren't mapped to a virtual machine this will go and find any public IP address that is not mapped to a book to a machine now caveat for this one my script will also go and check is it mapped to some kind of nat gateway something else a load but something this doesn't do that so just be super careful it's not being used by something else before you try to delete it these will help me go and find resources that aren't in use so just a little bit of help there virtual machine scale sets we're taking another step this provides a managed deployment and scaling capability now I have a gold image this could be a marketplace image or it could be a custom image it's when I've created and what's gonna happen is I have a configuration that says hey how many instances I want base and how many it can have when it bursts out I can have scale actions based on schedules based on metrics hey CPU is above this amount I can then use extensions to add configuration so rather than having this highly customized image of my app and everything in it try keep the image as standard as possible and then I use things like it's DSC share pop it whatever that might be to inject configuration ansible into the guest OS so some kind of declarative technology does this make it look this way so now I can take that base image and make it my application it's gonna reject my application into it and there's a huge benefit here because now virtual machine scales hips I can actually configure it so that if the image it's based on changes gets a new version even from the marketplace or custom it will automatically redeploy the vmss then want 20% at the time so it's a rolling update using update domains it was just now be using that new image so not patching these things hey a new image comes out latest patches on it phenomenon I don't have to do anything you know just get down and then it'd be running from the new base image so I specify there's minimum maximum and scaling triggers I can also use spot instances now sport instance is I think about the hotel that was you for a second I talked about if I pre book I get cheaper rate because it helps them plan in the same way if I just show up at the hotel and it's last minute actually they may have a bunch of empty rooms they're desperate to get rid of and they're trying to compete with other hotels they're gonna give it you cheaper they say okay look yet you can have this room but be aware that if someone else rolls up in his Rolls Royce and offers me full way I'm gonna kick you out the room sure I'll stand the room as long as I can for ten bucks a night I'll take that so I just bought instances are the same idea as you will have spare capacity it will vary by the type of VM D could be it will vary by region and so to help kind of clear this spec capacity then offer spot pricing then that pricing will vary depending on how much they have spare depending on the region so I could say hey no I want to use those spots and I can use it for regular VMs I can use it for virtual machines scale sets I can use it for as a batch and I say yeah I want to use your spot pricing I'll pay up to this amount okay so then I get much much cheaper compute but I may get kicked out with a few seconds notice essentially by somebody then comes along and wants to pay the full price so this can be super useful if I have workloads that maybe ants or a time-critical I need to get this work done I want to get as cheaply as possible I can use those spot instances so my batch my VM IBM scale sets now I can't mix spot and regular pricing in one VM scouser so would have one VM scale set regular one VM scale set using spot back to ever load balancer to kind of point to both of them so take a look at spot if I have workloads I just need to get staff done but how quickly gets down and kind of flexbone that's when it's done cheaply and spot is gonna be a very cost effective way to do that if I have stateless instances so there no unique special snowflake I care about maybe I can use ephemeral OS disks so ordinarily we have our s disks on Azure storage managed disk as three copies of it is highly durable with the ephemeral OS disk the disk actually lives on the host that's running the computer because deep provisioned if the host fails I'd lose it but if it it's not stateful if it's not special it's just duplicating a gold image I may not care if anything stateful is stored on some separate storage maybe it's Asha Falls as in there that files maybe it's a shared managed disk and maybe it's blob and I'm using blob fuse to make that available to me and I'll talk about that in a second I don't care if it goes away I've got lots of instances over different zones and scouts it's whatever yeah I'll save some money so I'm not paying for the rest disk anymore all I'm doing is using the storage that's actually on the host now kind of alluded to kind of the snowflake and the pets and the cat'll we are used to pets we're used to either my pets it's all again Eddie we name them we take care of them we feed them if they're unwell we make sure they get healed that's our regular workers we used to on prep of the main controllers and we're always going to have a certain number of pets but where possible we want to get to cattle sick and we just stand that one up in his place I'm not particularly naming them I'm not if it's sick I'm not going to spend a lot of money curing it there's no attachment there that's what we want our compute workloads to be when I get to cattle I think it's Jeffrey snowbirds I said and when we talk about comes out containers they're chickens no even cattle they just running around I don't want some unique special thing about any workload if I can help it I want to be able to take hey I've got some base image and I've got some declarative configuration and it makes it that service and makes it that app that's a much better scenario for me so VM scale sets are getting to that point so gold image and I stamp out copies of it there cattle now even with VM scale sets there might be some that are pets I can do things like about well the provision in this order don't delete that one so there are some special things I can do now as we start moving into things like containers they get more and more cattle light so what are containers well they're often how we start thinking about has on mass with virtualizing the operating system virtual machines virtual eyes the hardware virtual CPUs memory nichts storage with a container with virtualizing the operating system lots of containers run under some shared kernel so both sizing the operating system now we still have controls we have isolated namespaces for process and networks that we have resource controls make sure we don't steal from our neighbors have the CPU etc but we are using this shared kernel but orchestration is key just containers on their own is really not that useful I need to think about high availability of the containers I have to think about what load balancing I have to think about health and remediation resource control monitoring net storage integration I'm balancing my work clothes there's a lot of considerations for containers but on their own they they don't have I need an orchestration solution kubernetes one there are multiple orchestration solutions but kubernetes has become the standard as your kubernetes service IKS provides a free control plane for kubernetes so absolutely I could deploy kubernetes into other containers virtually I could stand up all of those components but why I'm just paying for staff I don't need the API server the scheduler the NCD database that's all provided on a pertinent managed I stated instance for me by agile with aks and I'll paint anything for that control plane I just pay for the worker notes so you say KS don't manually install kubernetes that you're then managing that you're paying for those components to run that control plane IKS gives me that for free I just paid for the workers now additionally yes we have the pods they're deployed those worker nodes I can auto scale those nodes I can add and remove nodes from my IKS environment based on the workload that's running again optimizing my cost I pay for the worker nodes if I'm not using them I don't need them let's shut them down stop paying for them I can also walk into outer container instances as your container instances of really containers as a service and the way it works is there's a virtual cubelet the cubelet is how the control plane of kubernetes talks to the workers and tells them what to do hey deploy this pod so the virtual cubelet makes AC I look like an infinite scale node so now aks can talk to a CI to deploy instances maybe I have some big burst I could use a CI for that service plans so now we're moving further it was one of the original services and when as I started it was like app services now there are various types of plan available and they have different features and different limits so make sure you understand your requirements like if I look at the features just super quickly you can see it's like a free so there's my most about this free thing says free they shared this basic when I get to stand it you can see it talks about Auto scale so it's standard and premium and ace I say I get into Auto scale it's about how many instances I can have so standard 10 premium 30 when I get into the isolated I in access environment I can go up to 100 so we have different scale capabilities but even when I deploy these the instances are different sizes numbers of course amount of memory amount of storage and I pay accordingly so make sure we understand what are my requirements what plan is right based on maybe features I need and then again I can scale the instances I have available to me so so can scale up an ER so I can actually make change the series in size and is almost no downtime it's really cool they're actually going read a new plan for me of the new size once that stood up my apps are hydrated onto it then it just flips it over it's almost like zero downtime I have all of scale we're standard and above I can appoint multiple apps to an app service plan and depending on the plan I might have deployment slots so I clear by prod I can have staging and then switch them to make very transparent deployment pattern there sharing the same compute nodes it's not a different set of nodes per app it's not a different set of nodes per deployment slot they are shared so I have scale the plan I'm scaling all the apps all the deployment slots just important to understand they are sharing that plan and the scale then there's the app service environment that gives me a dedicated instances deployed into my virtual network no shared components I've higher limits I can do Web Apps both Windows and Linux actually mobile apps docker containers and functions now set costs more because it's all dedicated app service plans there are certain components that's shared across tenants AC is all dedicated all to me all in my be net so obviously I have to pay a little bit more exist more to it let me get dis server list this is kind of the utopian AI think about efficiency of cost so Asha functions provide a service execution for many types of apps I come on c-sharp Java JavaScript Python PowerShell as others this is how I run my PowerShell now I'm a name of a person but I've got my PowerShell nail through larger function there I can actually use an app service plan if I have one to run my function so I've got existing resources I could run it in there or I can run it in a pure service mode where I just pay based on the resource I consume CPU storage going in and out I trigger based on a schedule based on a web hook based on some other event accuse event grid it looks someone sticks a blob somewhere event grid can fire and fire my function I go and grab the blob I run AI against it to find out what's in the blob and then I have bindings that I could buy two more things coming in so maybe the event grid was my trigger that the blob had been there the storage account is also a binding that I can read the blob then my out port binding might be to a cosmos db2 right here hey this blob this head this car in it or this human in it whatever that might be I have logic apps logic apps let me use kind of a graphical architecture of my connectors i pulling from this connector maybe it's a linkedin message i then go and link into this other service for sentiment analysis and then going again send something outbound some alert some message whatever it might be but I create these connectors and I architected by this visual flow to bring all those things together I just pay based on the connectors and the actions I use I just paying for the stuff I do moving on and I'm kind of bug my already as I knew I would be as a storage for example there's different types of storage account now for blob I have different tiers I have hot cool archive as I go from hot to cold archive I pay less for the capacity but I pay more for the transactions the idea is if I move to cool I'm expecting to use it a lot less so I want to pay less money destroy it then if I do interact with it I'm gonna pay more money archive is not even available in real time I have to bring it out of archive into call there's kind of a fetching fee associated with that to move it back use lifecycle management lifecycle management lets me apply wolves to say hey look if this date has not been used for 30 days move it from hot to cool if it's older than a year delete it and combinations of those things again optimize my cost that lifecycle management it helped me spend less money blobbin files have performance tiers the greater performance the greater the cost but it might be required I might need higher performance I'm willing to spend the money pick my resiliency based on the requirements I can have a minimum there's always three copies there's LRS three copies in a particular stamp particular data center z/os there's three copies within my region but now spread over data centers GRS three copies in one thing Center in my primary region three copies in a paired region geez ers three copies in my primary spread over three data centers replicate into three copies in the paired region and then it might be read access variants of those GRS and GRS I want to have pay more so I move up from LRS two 0s to GRS gz o is our pay more money but I get better resiliency but what is my requirement don't pay for resiliency I don't need if my app is running in two regions and it is replicating data itself to a local copy why am i bothering to make the data resilient as well understand my requirements architect accordingly maybe I only need zrs if I have a lot of on-premises data maybe in file shares actually using Azure files might save me money so as your file sync lets me replicate from file shares into a default a share in Asia and there it is its story in Azure files I can use tearing to say hey this stuff on-prem if you've got an expensive Sanders running out of space the stuff I'm not using on Prem or tear it out to the cloud so it's stored in as your files but not in my own Prem San anymore saving me money and adding resilience saying goes wrong over the new ad integration Frazee files I can actually access it and have the ACK was enforced in Asia for managed disks the capacity and the performance scale pretty linearly the bigger the disk the better for the performance pick the disk that makes sense based on the capacity requirements that make expense on the scale I need and that they might be hey I don't need a very big disk bunnies some high performance so I have to pick a bigger disk ultra actually lets me scale those dimensions separately I have my through port my IAP Smike epatha T three different dimensions I pay for each of those separate dimensions if I actually look at the pricing page don't you see normally we kind of get Haegele this disk size we get a capacity we get I ops and it goes up linearly if I look up ultra the pricing is very different take it up I pay certain men for I ops certain mount a through port and a certain amount of the actual capacity the capacity I ops throughput so I could have a one terabyte disk doing a hundred and sixty thousand one ops I can change it dynamically so I can actually use REST API to programmatically change my eye ops for a nightly batch job and then drop it back down again so I pay less money so make sure you understand all the options that are vailable when you're kind of sizing these things to make sure I'm not spending more than I actually need also remember both compute resources and storage resources have their own limits they need to be architected together to meet the performance requirements you have what I mean by that imagine I took for example I'm a piece 60 disc a premium SSD they can do is sixteen thousand ions phenomenal and now imagine I connected those to a d4v stream that can do 64,000 6,400 IUP's that's nowhere close to 16,000 I ops so I'm just wasting performance I'm not going to get what I need I need a bigger VM to take advantage of the AI ops available to that disk so make sure if you're not seeing the throughput you think you should be gained maybe I need a bigger VM maybe my VM too big or maybe my disk is too big understand the different dimensions an architect's calling me don't pay for more performance and I actually can do anything with I'm just kind of wasting money databases use pass if I can so there's a pass database option it's gonna be generally better and cheaper than the ayahs I'll give an example this it's actually I think about four Postgres I've been playing around with the assist of my customers and I did a cost analysis and using the asha database for Postgres gives me the same SLA four nines as running Postgres in is VM split over AZ's but it's only 70% the cost they might say how can that possibly be a lot of the data services used it's not regular containers missing that type of containerization technology that they can spin up in seconds and so they don't have to run to instances running in kind of an active/passive configuration to switch over they can just run one compute instance with the data separate if it died or it has to be updated in some way they just flip over to another one created within seconds so it's actually cheaper to run the managed offering just in pure terms of aja costs than running it in an is VM and I get seven days a backup included and I'm now not paying the same amount of sort of DBA costs for someone to look after this thing I do minor updates it just happens to me automatically so if there's a managed offering a pass offering for a database Germany it's gonna be a better bang for my buck now I may have certain scenarios I have some super custom thing and I can't do that totally get it but just make sure you understand what the options are as you sequel database there's numerous tiers and sizes I if I go to the V call instead of the DTU the database transaction unit which blends computer storage together if I go to the V core model then I pay separately for the compute and the storage there's even a service option for as a sequel database I can pause the compute and just pay for the storage just like as a synapse what was as a data warehouse I can pause the compute the synapse I can pause the compute I just paid for the storage there's things like hyper scale there's all these different options to reach one optimize my costs now as a cosmos DB is actually a challenging one and it's challenging because they use request units it's not that it's flawed it's that it can be a little bit of a dark heart to work out what are the number of a quest units I need and so you need to really take time now good news is there's now an auto scale option the you speak autopilot a net auto scale because you used to have to just do provisioned are use you have to have a guess I'm gonna need this many I'm provision that many and then I'd go and look for four to nine error messages which means how you've been fought old or I'd monitor and say hey I've got way more than I'm actually using now we've Auto scale it will adjust the argues up to a maximum I set to meet my requirements that's only half the picture of Cosmos like any database its partitions there's lots of instances that partition my data so I have to pick how I partition my data by partition chemo sharding my data I need to make sure shard it the right way I I party she allergic Lee so that when I run operations against it ideally as fewer partitions as possible can answer that request because then it's gonna use less request units I less money if I run a query an operation it has to go against every single partition so that cost me a lot of requests units if I have architected my data my partitions to meet power run operations and an operation come on against a single partition so we're gonna optimize my cost it's why we've cosmos DB can actually see some people would you placate the data the data is cheap comparatively to store the data so actually duplicate data using different partition keys and they use the change feed in cosmos DB to trigger that data duplication based on different ways I want to interact the data key point here is you have to tune your operations make sure you're looking at how using request units that's how you optimize your spend with cosmos DB but definitely Auto scale helps hugely but you really do need to pay attention to how I'm partitioning the data how my operations run against those partitions to to get the best bang for my buck and just finally super quick network data remember you pay for egress formal region that's going out of Asia I pay for there are exceptions Express route local is a type of circuit where I can only connect to the near region so if I create an Express route circuit in San Antonio I can only talk to the San Antonio region as your backup doesn't charge you for egress when you're restoring but for everything else you pay for egress so think about that egress traffic when I'm architecting my services if I peer networks together then there's an ingress and egress charge for the data that flows between them I pay for Gateway to be at a site site VPN gateway or an Express route gateway I pay for gateways so consider those different charges and costs when I think about my network architecture it might be cheaper pretty more efficient to have a hub network with my Express route gateway then use peering from the hub with transit to let them use the remote gateway and they flow through that single Express route gateway rather giving every network their own express route gateway and there's limits anywhere but realize they were cost they were costs for peering there's a bigger cost for globe will be net peering but I pay for gateways as well so you want to weigh all those different things as part of those costs for internet facing services is egress obviously the services like front door can cache data and they can accelerate its split TCP the the connection for the end-user is kind of terminated at some edge location that goes on the edge of backbone to get to my service so it might help optimizes caching services available again think of that in my overall architecture finally and if you take nothing else away which is why I've saved this to last there are many ways to continue optimization journey it's ongoing as advisor brings that best practice guidance around many aspects performance operational excellence high availability security and cost and it's that bit we're going to focus on so as your advisor continually evaluates the resources you have deployed and their use it was then after a period of evaluation make recommendations some of those will be hey this sequel database is too big you should right size it you have unused public IPS you have unprovided Express route circuits unused VPN gateways and there's a whole list of them you can go and see but pay attention to those things I've embroidered a customer who brought it up in a session and it was like potential savings seventy thousand dollars a year okay I should I should probably go and do some of those things so look at this weekly just go and look at advisor and I can actually plumb this into things like actions so I can create alerts based on the recommendations that can fire an action group which could email me so I actually trigger a news resource graph to look at these things there's different ways to look at these recommendations but definitely definitely look at them know this is not going to give me architectural guidance this is not gonna say hey I have using my super small a I said you should move this to a path service or it's not going to that level so this is a step it's not the only step you should still be thinking about other options that may be available to me so in conclusion covered a lot of stuff nearly now on a half video I apologize it's lots of considerations hopefully you kind of saw that don't look at maybe the blog article to get kind of a quicker summary of some of this stuff it is an ongoing effort yes before we migrate before we deploy we think about this and it's running we continue to think about it stay current fingers change yes I've done a deployment but stay up-to-date on what's happening in Azure maybe there's a new service tier there's new burst capability well this is completely new service and I should think about we architecting maybe I can move maybe finally I can move from there is base database to a managed one they've added some new feature that now meets the requirement that it didn't meet before be willing to question architecture choices and assumptions again never assume the person that maybe came before you did the right thing they may have inherited it from someone else so at the time - she really was tired they said our to do this often we might deploy things it's better to do some think maybe that's not optimal then just spend forever stuck in an analysis paralysis and then they left and no one ever came back and did that step so it's okay to look at what's there and maybe question it what were the assumptions were they valid are they valid Nia what were the requirements said there's a business requirement changed her fingers evolved so we maybe should be doing st. different now so it's okay to question those things now understand when we do that and we want to maybe modernize our app we want to change something that will probably cost money there's gonna be developer time project management time but overall that may cost X dollars over a one-year or two-year or three-year period if now we're optimizing the services in Azure it will overall save us money so that you always have to kind of take that balancing act if you take play nothing else look at your advisor once a week if you only do that you still better often a lot of people and it will point out some some pretty big things you but hopefully you understand some of the different things going on the considerations I put out an azure update every couple of weeks about the new features watch that they'll keep you up to date a bit this was super long thank you for your attention if it was useful please like subscribe comment share and until next time take care of your money and does they say
Thanks for posting this.