Microsoft Azure Master Class Part 7 - VMs and VMSS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I love this guys videos. I am a bit scared he'll find me and crush my skill if I don't subscribe though.

👍︎︎ 1 👤︎︎ u/andy_mcadam 📅︎︎ Oct 21 2020 🗫︎ replies
Captions
hey everyone welcome to part seven of our azure master class this is all about virtual machines and virtual machine scale sets looking at the core attributes of virtual machines our options and then really building on that for things like virtual machine scale sets as always if this is useful please go ahead and give it a like subscribe comment and share and hit that notify so you'll know when i post the next set of videos okay so virtual machines and actually before i go into that i did want to stress a point a few people mailed me hey is there a handout where are the materials if you actually go and look at the youtube where this is if you look at the playlist and below every single video i actually have a link to the github repository in the comments so if you actually go to the github repository if you look there's a folder for every single one of the modules in that folder for example it will have any artifacts like sample scripts it will have a pdf of the handout and it will have the image of the whiteboard on that main page of the github repository i actually have kind of the readme and i have all the extra links to extra videos that would supplement the module that goes into more detail about any particular aspect and so if you want the information please go ahead and check out the github repo that really has all of the information that i cover as part of the class okay so virtual machine basics virtual machines are infrastructure as a service is the building block of many many services in azure and by extension virtual machine scale sets it's really useful to understand the virtual machines because even if we're using app services maybe it still uses vms underneath often we still pick well what's the series what's the size um what are the various attributes we want to use understanding the building block really helps us use the all up service it's often the easiest to relate to as it does closely align with kind of what we're used to on premises so once again with a virtual machine we have full access to everything inside the virtual machines operating system in kind of the foundational class i went through layers and you can really think about well yes there's things like um the storage there's the network there's the compute i the servers and then there's the hypervisor i.e hyper-v and all of that stuff that's azure's responsibility we don't see that then with a virtual machine we do see the operating system any runtimes any middlewares any apps and of course our data so we are responsible for all of those things we have full access be it windows or linux i can rdp i can ssh i can win rm but we are responsible for all those things so with the operating system i'm thinking things like patching i am responsible for that things like anti-malware i am responsible for that backup i am responsible maybe there's replication i'm responsible for that configuration inventory but this kind of goes on i'm responsible for all of those things and there's things to help me in azure there's extensions there's services but that's where the responsibility kind of flips over from azure so i'm going to think about this is azure's responsibility i don't see this stuff i just get services offered that are built on those and then everything the os or above well that's you as the customer these are my responsibilities but again azure helps there are things that help me do these things but essentially i full access to the os and everything above it so with that said because i have all that responsibility i need to plan ahead into what how am i going to handle my backup what is my disaster recovery what are the policies of my company is it going to integrate with active directory this is a vm at this point just like a regular for example windows it can join an ad domain i might have existing agents things like sccm or scom or splunk or whatever that might be it's just an operating system it's just that it happens to be running in azure it's essentially a vm so i can still use all that existing expertise i have so let's start with kind of the most fundamental element of a virtual machine that's really the series and the size often different workloads have different resource requirements i have some workloads that are very memory intensive some are very compute intensive some that need lots of iops some that's very network heavy some need specialized hardware high performance compute i need rdma network adapters between them maybe i need fpgas i need nvidia cuda cards for ai and graphics for visualizations i might need access to super fast local storage there's a whole set of different requirements and focuses we have for our workloads so step one is i need to understand that i need to understand what is the requirement i have because just like t-shirts i have different fits of t-shirt i have kind of relaxed fit i have sports t-shirts and they come in all these different sizes so i can think about vms are the same they come in sizes within a particular style but there are different styles different series that focus on different things is it compute is it network is it storage etc so we have these different types of capabilities to help meet my requirements so there are a whole bunch of different series in terms of really what is the focus for those resource ratios and i want to kind of just quickly kind of talk about really how that that boils down firstly we have to think about there's a physical host so i have some physical host that is going to host my virtual machines now these hosts are deployed in azure and stamps in clusters that are all identical and there's different types of cluster based on the different types of virtual machines it will actually be running but this host level i can think about well it has a certain number of cpus of a certain version it has a certain amount of memory it has a certain amount of storage now when i think about storage there's a certain amount of local storage actually on that host so think about well it has some local storage here but additionally it has a certain amount of kind of connectivity it has a certain bandwidth to support a certain amount of remote storage then of course it has a certain amount of network connectivity and then other it may again have special types of maybe uh melanox network cards for big high performance compute could have those cuda cards and for visualization for ai cognitive services but i have a certain set of capability of the host and then we think about okay we create a vm and essentially that vm is going to get placed on a particular host that has capacity and those same dimensions we thought about for the host really apply when thinking about the vm as well so that vm well it has a certain kind of cpu specification we talk about virtual cpus but physical cpus talk about cores and hyper threading which really maps kind of logical processes then we have virtual cpu so we think about what we have this virtual cpu which has a number of different attributes to it as well i can think about well i have a certain number of these i have a certain number 2 4 6 8 whatever that might be then i can also think about well i have a certain type is it intel is it amd then we have kind of characteristics around performance so i think about the performance the way this is measured in azure is something called azure compute units acus and we'll talk about this in a second this is how i can measure the relative performance of different virtual cpus for different series of virtual machines also some of them have kind of boost capabilities additionally maybe they have certain types of feature so what do i mean by features well i can think about maybe it supports hyper threading that is exposed for some of the virtual cpus based on what the physical is some of them support hardware assisted virtualization that would mean i can actually run a hypervisor like hyper-v actually in the virtual machine itself it's not really recommended if i run a hypervisor inside the virtual machine i then lose the benefit of things like azure resource manager a lot of the azure capabilities because now we can't see the vms in the azure vm so we lose a lot of capabilities we try and stay away from that um in addition to other kinds of features things like sgx the the software kind of guard extensions it gives me that secure capability and there's other things as well but essentially we have different features of the processor also the physical processes of the box there were newer ones over time and so what we'll often see is kind of this version we might see hey a v2 crops up then a v3 then a v4 and that's exposed to us we can see for example hey there's a dv2 there's a dv3 there's a dv4 and you may wonder well why bother why not just i'll take whatever there is why do i care because generally as the versions go up the performance will improve and the acu will actually get better now again you might say well so what sometimes i get better performance there will actually be a problem when you're planning if you for example were architecting your solution and you work it out and on one day you deploy and you happen to deploy a host that's running the latest processor you get a certain performance then tomorrow when you deploy you happen to get put on an old one and you get half the performance it'll be very very difficult so this enables you to be very specific and say hey i've already got four v2s i want another two v2s i want the consistent performance i don't want this variability because maybe the way i'm balancing i'm just round robin or i'm doing an equal spread so i get a very inconsistent utilization so the reason the versions are there is sometimes these different features might be exposed sometimes it's just how they have a different performance and i want to kind of know about that then they're going to have a certain amount of memory so that's going to take a proportion of this so it's going to be some amount and kind of gigabytes then there's going to be a certain amount of storage now this can be many many different elements i can think about well there's a certain amount of local storage for pretty much all of the types some of them don't have it but there's kind of well what is the type for example is it hard disk drive is it ssd some of them also have kind of nvme type storage so this is where hey yeah this host has some local storage what we're actually going to do is map to a vhd often within there for example for the scratch drive so i would actually see that so i would see the type i would see things like the capacity how much of that local drive do i actually get and also there are kind of options around what is the performance um what is kind of the the right what is the latency so there are some different performance aspect and that comes into kind of what is the type of that local storage maybe some other tweaks around that so i might get different latency for that kind of local storage etc then there's remote so hey i can use remote storage and here i'm focused on kind of disks because obviously there's other types of storage service in azure there's azure storage accounts there's blobs as azure files well that's accessed by the network component of the vm not its storage connectivity so for remote i can talk about how many disks i'm actually going to connect to what is the max kind of path because for a vm i actually have a maximum number of iops and kind of a throughput and that's important to understand because i might just keep attaching disks to a virtual machine but the vm has its own limits i can't exceed those now sometimes i have a boost capability and what that boost actually not boost with a burst versus a better word for that we have a burst capability so sure i have a normal amount of maximum iops and throughput but for a limited amount of time i can actually burst up to a higher iops a higher throughput for that to give me a better performance um so i have kind of these remote storage capabilities then of course well we have things like network and once again for network that could be like well how many nicks what is my kind of throughput what do i actually support on there so i can think about like bandwidth actually i'm going to like bandwidth it's a better word for this so i got my knicks it's terrible today so i've done my number of mix and i can think about what is my bandwidth and this is where if i'm using things like azure storage or smb or nfs that's going to take my network path of my virtual machine and then there's just kind of special stuff i can think about well maybe i have these nvidia cuda cards maybe i have these big high performance complete and i'm using rdma networking maybe i have these kind of big uh nvme type capabilities so i have this massive amount of super high performance super low latency cash so you have all these different dimensions of the virtual machine and this is where the series and the size comes into play because we have all these different series let's kind of shove that over for a second so we can often think about things like a general purpose so a general purpose has a good kind of balance typical ratios of cpu to memory and there's things like the b series the d series i'm gonna forget all those db3 v4 so there's different versions as the processors have improved there's the a series most people don't touch the a series anymore a series is kind of unique when i talk about this local storage the a series uses hard disk drives everyone else is an ssd so generally we stay away from the a series these days then i can think about ones that are compute optimized so the ones that are compute optimized or they have a greater ratio of cpu to memory for example the the f series then i can think about ones that are memory optimized now these might be very good for example if i had maybe a database for example so things like the various e series uh the m series and the db2 they have a greater ratio of memory to cpu the ones that are storage optimized so when i think about storage optimized well it's the l series the l series it has things like big throughput for the vm but it also has those kind of nvme super super high performance local now it's ephemeral ephemeral if it's in the host it means it's ephemeral i if the vm is de-provisioned or it crashes i'm going to lose the content so the way we would typically use those kind of nvme storage is when it starts up it would populate it with stuff like a cache and use it from there so if i need some kind of cache that doesn't have to be durable if i lose it hey the actual data is persisted somewhere else but i can get this very high performance local kind of cache then there are for example the gpus this would be kind of the n and there's a whole bunch of different ends there's i think there's an nc for compute and there's a v for kind of visualization and d for deep learning so slightly different kind of combinations and based on what we want there and then there's high performance compute so the high performance computer have those kind of melanops so we said hpc and that's the h series and with all of these there's different like variances different versions there might be often you'll see an s variant the s variant means it can use premium storage there are other types of variant in there as well but essentially that's what the series are based on different focal points based on what the workload is going to be and then within the series we have sizes where all of the different aspects proportionally go up within that series kind of set of ratios so that that's the whole point behind it if we actually look so we'll jump over and actually take a look at these in a bit of detail so this is showing me all of the different virtual machine sizes and again it's showing although this general purpose kind of that balanced cpu to memory ratios it's showing me the ideas between those compute optimized memory optimized storage optimized ones with gpus and the high performance compute so all these different types available and then the point is if we go and look at some of these actually look at the memory optimized bit more interesting for any of these we'll actually see there's a whole set of different sizes notice here and for all of them there are ones with an s the one without an s this is about what can i use premium storage or not so here we look at the ev3 and the esv3 and here is where we see all the different sizes so we can absolutely work out okay we can see this kind of size column here and you can see the cpu goes up the memory goes up the temporary storage goes up the max data disks go up the temporary storage performance goes up the nix and the bandwidth go up and they go up proportionally because essentially what's happening is you're taking up more and more of the underlying host so you get that percentage worth of the underlying hosts actual resources and for all of these you're getting kind of the full set of resources i get all of that cpu i pay based on the virtual machine i'm not getting kind of a discount if it only runs at 10 cpu unless i look at the b series so the b series is burstable for the compute and the way this really works is you'll notice i get a virtual cpu but we get this additional base cpu perf of the vm now this is based on a particular a single cpu which is why it goes like 200 because here we can see it's got 12 cpus so i get 200 of what would be one cpu so essentially it's almost like a 6 of a cpu per virtual cpu i see in that virtual machine and then you'll notice there's a max cpu performance next to it so what i can essentially do is here well you can see i can actually go to 2 000 when it has 20 virtual cpu so i can actually burst to 100 of all 20 for a period of time and what really this is doing is it's like a cell phone plan the idea being that for that cpu instead of having a hundred percent the cpu and paying for 100 of the cpu all the time you can kind of think about per cpu i get a certain percent so if there's zero and a hundred maybe what's allocated to my virtual machine is ten percent so that's what i'm allowed to use now if i'm actually only hovering around one percent here what we actually do is we actually we start with a bucket of credits but we actually start accruing more credit so i'm actually accruing credit so i can actually go well beyond 100 but let's just we'll draw up here so i accrue credit and then something happens well assuming i've accrued credit i can actually burst out up to 100 for a period of time and then i'll drop back down and so when i'm bursting up you can imagine what they are well i'll start consuming my credits and then when i go down again i'll start accruing them again so if i had some workload that's more variable maybe it's a domain controllers they have a burst at certain times some kind of website that gets a burst of traffic the b series can be super useful there because it's cheaper i'm only getting ordinarily a certain percentage of the cpu but hey i can go and burst up as required assuming i have accrued enough credits so the b series is really kind of unique there for all of the other vm skus i'm really just paying for the cpu all the time there's no over commit i get that virtual cpus allotment of azure compute units if i don't use them i'm still paying for it there's no kind of variation in that so this is kind of the b series is very unique in that it gives me that ability to burst up and save some money by buying kind of a cheaper vm now as i talked about within the series there are sizes it goes out proportionately all the different elements based on its focus now some virtual machines actually have a cpu constrained option now you might wonder what's this all about so if you think about it for a second we have all these different skus a virtual machine and maybe there's memory optimized and there's some obviously there's even bigger ones there's things like the g series i've missed them out there somewhere the godzilla the m series and they have lots of memory but they still have a lot of processors but if i have some workload where i pay kind of on a compute core basis like oracle or sql server maybe it's so memory optimized i need a massive amount of memory i really don't need the associated cores that come with it even for things like those memory constrained so let's take a step back let's look at these memory optimized for a second and we can see let's look at some of these big ones the m series for example well on this m series it's great i have more memory than kind of cpus it's giving me that those ratios but it's still a lot of virtual cpus if i want to get to for example a terabyte of memory i have 64 virtual cpus there that means i'm paying for all of those cpus for my oracle for my sequel so what's available are these constrained virtual cpus so i still pay for the regular size but it's essentially going to hide cores from me so here we can see those same m series and we can see here for example hey it's the same as an m64 but it's only going to expose 32 virtual cpus to me so if i need the memory but i do not want all of the virtual cpus associated with it i can actually hide them i have that ability to say yeah yeah i need everything else the iops the throughput storage capacity the memory but i really don't need all those virtual cpus and it's not even i don't need them i don't want to see them because then i have to license them as part of my software so those constrained options are really useful when i have those more strict requirements about having to license every cpu that's inside that virtual machine so constrained hides a bunch from me so talk about the azure compute unit and this is essentially how we can measure really what our overall performance is so let's kind of look at this document in a bit of detail we start off with the a series the first virtual machine and in the a series the good old a series it was essentially a score of 100 per virtual cpu all of the other scores are then proportionally to that so i could see for example well the dv2 over here well it's 210 to 250 so it's at least double the performance now the reason it's kind of a two number it's variable is this little asterisk is saying it probably supports the intel hyper turbo technology so here we go acu uses the intel turbo technology to increase performance for a period of time likewise there's an equivalent technology on amd so what we're seeing here is okay i can see proportionally the performance now if we keep going we see well that's weird the dv3 actually dropped so the dv3's performance is lower than the dv2 like what terrible batch of processors did they possibly buy to make it lower and it's because you have to look at this other column what this is showing you is the dv3 introduced the concept of hyper threading so now each virtual cpu was essentially now mapping to hyper threads where the physical processor has two threads it doesn't double performance but i get an increase in performance so the dv3 um proportionally adjusts the cost based on that so each virtual cpu is slightly lower in performance but i get double of them so that's why we see that but then again it goes back up with the v4 but this is where i can go and see what is my actual performance of those processors here the h series goes up to 300 hc i think that's the king 315. i think that's kind of as good as it gets right now so that's kind of the highest performance which makes sense because the h is the high performance compute one thing i do want to stress is there is no core pinning with azure if i think again that the vm has kind of virtual cpus the physical box has cpus it's not permanently pinned that virtual cpu is not always this particular physical core underneath it's hyper-v the hypervisor is constantly working out where do i run this particular instruction now understands numa the idea that there's banknotes of memory attached to certain physical processes so we'll keep it within there but outside of that it's not always pinned to the same physical processor but that's the hypervisor's job hypervisor's job is to do that scheduling working out but the acu is reserved for me it may not be an entire processor if you think about some of the smaller vms and physical processes improve over time my acu of 160 may only be half of a physical processor so i don't necessarily get the whole process up but the way they work out capacity is they don't over commit in terms of those acu's there's always enough physical processor capacity to handle the acu values at the vms associated to it so the b series by first of all their spare capacity is not guaranteed i could always just burst up whenever i wanted to okay so the vm building blocks so in the azure resource manager there is way more than just the virtual machine so obviously there is the virtual machine we have the vm and essentially the vm we set a certain series and it's a certain size but even within there not every vm is available in every single region if you actually go and look at kind of like the azure regions and by service you can actually go and see all the different services and work out well where is this particular series available especially some of the more exotic ones some of the n the h series the m they may not be in every single region so you have to go and work out well which region is this in and then even if it's within a region it doesn't mean it's available in every single physical data center every cluster and why that's important is if you think about capabilities like proximity placement groups a proximity placement group if you remember is all about the idea that hey yes within i have a certain region within that region there's kind of multiple physical facilities and for example availability zones and then a proximity placement group is hey i want to get things as close as possible so i create a proximity placement group and that pins it to wherever the first vm goes and that's important the point of the proximity placement group is it's bringing things in within a certain very small latency window so the first vm i create in a proximity placing group is some generic d series i'm going to put maybe in a certain set of clusters that support the d series but don't support the n or the m these exotic ones so i want to make sure my proximity placement group gets pinned to a set of clusters that support that exotic fancy vm i know i want to use so the first vm you create in the proximity placement group should be the biggest the most exotic so i'm going to use an n series i'll create that first or my g or my m then i can go and add the d series the e whatever the others are doesn't matter as much but make sure your proximity placement group gets pinned to parts of the region and the data centers that support those fancy vms you know you want to use later on now additionally we have this thing called spot and if we think back to those those regions they have banks and banks of servers within those servers there's a whole set of capacity and they're not full azure make sure there's a certain amount of spare capacity before things happen things fail you have to spin things up fail over et cetera but at the same time you don't really want things sitting there just completely idle what would be great if you could have some stuff running that if you need the capacity back i can kick the other stuff off it's less important and that's exactly what spot is so with spot it's basically the ability for me as the customer to get compute capacity really cheap but it's really cheap because at a moment's notice i think it's 30 seconds something like that if azure needs that capacity back because a full paying customer wants it they can kick me out now that's fine if i have some kind of interruptable workload it may be rendering it's some kind of batch job that i need to get done but i'm not in a huge time crunch i'd like to do it as cheaply as possible spot is phenomenal for this and one of the nice things they've recently done is actually we go and take a look let's close these down when i actually go and create a virtual machine now i can go and look at the history of the spot i can see the price of spot in regions around me and what's the chance they're actually going to kick me out so in here i'm going to go and look at my virtual machines and we'll just say hey we're going to create a virtual machine when it comes back there we go so we'll say add a virtual machine and if i scroll down i can see this azure spot instance and i've got it set to no and this is where i would just pick the regular size it shows me how much it is a month i can go and see all the sizes that are available in this region so i have west us i could see everything that i could see in west u.s and it's going to show me all of those there but here i can say well i'm going to use azure spot and if i select this now i have a choice i can say hey just capacity only and what i mean is i'll get it as cheap as i can but they can kick me out if someone else needs it more or i can say i'm only wanting to pay up to a certain amount and i can actually say well should they just stop and de-allocate it oh yeah stop paying or actually delete everything associated with the virtual machine so here we can see normally this d2 is a certain price certain price per hour so let's look at the view pricing history and now i can see well okay i've stood the region i selected this is kind of the price currently for this spot price west u.s is a different price central or west central which is actually cheaper but i can also see the eviction rate so i can see here there's only a zero to five percent chance i'm going to get evicted whereas west central there's a five to ten west us two is a fifteen to twenty so i kind of like this idea of west u.s zero to five percent eviction chance but the price will go up depending on what else other people are doing so i can put in a price here of what i'm willing to pay so i might say well i'm willing to pay i don't know what the normal price that this is 30 cents an hour i could say okay once again if i turn off the spot pricing let's say no so now it's 85 a month so you'd have to kind of go and back work out the normal hourly prices but essentially i'm getting it for cheaper the whole point of this spot pricing is hey i've got something that's not in a super rush i just need to get the thing done well this is a nice option for that then we have nicks so we have network interface cars if we start building out a view of what is our virtual machine we can think about okay yes i have the vm and that vm is a certain series and a certain size then i attach to it one or more mix now the number of nicks depends on the series and the size within the nick there are ip configurations and i've had multiple ip configurations there's a private ip and then optionally i might have kind of a public ip and then that nick well it goes and attaches to i can think about i have a virtual network and that's made up of various subnets so maybe that's subnet three and there's something two and a subnet one i could have another nick maybe maybe some kind of network virtual appliance or it could attach to a different subnet but it has to be in the same virtual network i cannot span virtual networks and these are all within the same region so i i cannot have a vm in one region connecting from a network attachment perspective to a network in a different region it doesn't mean i can't talk to things in another region absolutely this virtual network might have global v-net peering into another v-net and then via ip sure i can talk to the stuff but my actual network interface cards they have to talk to a subnet in a v-net in the same region and if i have multiple nics they have to do the subnets in the same virtual network i can't span them now at this point though this is just a regular ip configuration it's a virtual machine there's a guest os so there's this os series and there's an operating system running in that thing well now that os has an ip address to these ip addresses if i'm connected to this network i could rdp to that thing i could ssh i could go to web services it's just a regular network configuration at this point i can do anything i would do with a regular network configuration now remember there might also be things like an os firewall so just like you would on a regular operating system before i can get to it i might need to open up exceptions in that network firewall that's running inside the vm exactly the same way i might have network security groups which we talked about in the networking module the master class they might be protecting traffic that can get to that subnet as well so i have to think about the different layers of security to make sure the traffic is going to flow um as i actually intend it to and then additionally there's other types of components there's things like load balancers i might have some kind of load balancer in here and i can think about well i have kind of a front end and these networks can actually be added as part of kind of my back-end set so they can be part of a load balancer app gateway back-end set to get traffic as it comes in to the service so i have my networking constructs um of this then i have an os disk now ordinarily it's going to be a managed disk we want to use manage disk there is unmanaged storage as well can we talk about the storage module that's where it's a page blob i have to worry about the iops per storage account it's not a first party arm resource i just lose a lot of capability so i'll use a managed disk and that's going to go and be connected to the virtual machine so i can think about great i have over here i'll do a different color a managed disk that is an azure resource now behind the scenes absolutely that's sitting in kind of an azure storage account there's a storage account there hosting it i just don't see it and that is connected to the virtual machine for the os so this is durable there's at least three copies so there are three copies of this thing now there is the option to have ephemeral os disks ephemeral means remember this vm is running on a host if we go back we think about here and that host has some local storage in addition to kind of that it also has kind of a cache that it can use for certain things to speed up performance so ephemeral disks actually can use the cache and even now that temporary storage to store os disks this very useful virtual machine scale sense if i have some workload that has no state nothing i really care about long term instead of using a managed disk for the os that takes a little bit longer to spin up the latency is going to be longer so obviously that the managed disk is not local to the node so i'm going to get an increased latency if there's no state i care about it's a complete cattle it's a tin soldier if it falls over i just put another one in its place some of the vm support with femoral for the os so i'll actually use the cache and if there's not enough cache they now also support letting me use my temp storage and i will actually put my os drive there obviously if something happens to the node i d provision it i lose the state i lose everything i have to just recreate it from scratch again they will provision faster and they will have a lower latency so if again if i have a stateless workload it's actually a good solution to leverage that so if i have virtual machine scale sets um perfect for that scenario then normally normally um i have a temp drive there are now some virtual machines that don't um there are now i think it's the v4 of the d and the e there's a dd that does have temporary storage and then there's just a without the d that doesn't have temporary storage but nearly all of them do and what i actually have here is i think about well remember this thing again is running on an actual host there's an actual physical box running this this is my host it has kind of remember that local storage of its own and it attaches basically a vhd on here for temp again we talked about that in detail in the storage module and both linux and windows will see kind of this scratch drive but again it's not durable if the node crashed if i de-provision my vm and then restart it again i'll lose the content of that there's a big data loss warning on that disk so if we quickly jump over and if i look at explorer so this is a virtual machine and we can see here i actually have this temporary storage drive d it has a file on it data loss warning read me i can look at this this is a temporary disk any data stored on this drive is subject to loss and it's trying to tell you as clearly as it can don't put stuff on this you care about and windows sticks the page file on it by default it's just hidden if you needed like a scratch drive it's great it's high performance it's local definitely use it for that but this is not a data drive do not put stuff on this you actually care about and we'll come back and use this again a little bit later on so we have that temporary drive as well and then just basically well i can add additional data drives and this will be at different types again depending on the site the type of the virtual machine if it's like that s i can use premium and there's also ultra and once again it's just more managed disks once again they're in their own storage accounts or maybe it's just hidden from me i can have other managed disks i can have multiple of these the number depends on the series and the size of the vm and i can just add those as data so ef i can use d if i need to for the temp it involves kind of moving the page file temporarily to the os drive rebooting changing the drive of that but it can be done now for both the os and the data i can control the caching configuration so by default the os is always read right which we generally want for an operating system for data though there are sometimes we don't want any caching i want it to go straight through to the disk so we can configure this so if we just come out of this for a second so if i go and look at an existing kind of virtual machine doesn't really matter what one if i look at my disks so i can hot add i can actually add additional disks while it's running go ahead move but we can see here my host caching is none this is my domain controller this is where the active directory database is i do not want caching on this so i have it set to none but you do have options likewise for the os i've got it set to read write but again you can change that caching option so understand what your workload is understand what the right caching option is and then tweak that configuration based on what you need then there's extensions so extensions where we really start to come into all of the power of azure helping me yes i'm responsible for it but there are a ton of kind of extensions i can add here there is always kind of an azure agent that runs inside be it windows or linux there is an azure agent there this helps do some of those core things but there are extensions around things like anti-malware there's extensions for backup there's extensions for config there are things for kind of replication so i can easily replicate it to another region you name it there's probably an extension there to do it so once again if we quickly just go and look at this virtual machine we can actually jump over and look at our so you can see here i've got kind of a custom script extension this is phenomenal this lets me take a script it could be powershell cmd a bat file if it was linux so i can have bash it's running under the native guest os it will basically funnel that script in to the os and execute it it's great for provisioning i can make it go and do things but i can also do it while it's up and running i can at any time trigger that custom script extension to make it do something you can see i've got anti-malware i've got various kind of monitoring agents that hook into things like log analytics if i go to add because there's a whole bunch of other types of agents from other vendors for different types of capability now in addition to that we can see things that are delivered through the azure agent through kind of custom extensions i can see hey you can automatically shut down at certain times again shut down in terms of d provisioning that's when i stop paying for it if i just shut down within the guest os i'm still paying for it because it's still provisioned on a host i actually want to shut it down so i stopped paying then i just paid for the storage i can make a few clicks make it go and back up to a recovery vault i can set up replication to a region i pick so this uses azure site recovery it puts an agent the mobility agent inside the guest and now i can make this replicate to another region i can actually go and configure things like automatic update for the operating system running inside there's inventory i have this nice little run command this is a cut down version of kind of the custom script extension this is just i want to run a command so i could do a a custom command and there's things to help me hey i've locked myself out of my vm i could maybe reset certain firewall reset the rdp ports so if i've done something silly and i've locked myself out i can fix it i can enable administrator accounts so these are just using a run command extension to do things inside of that virtual machine and again also some of those give me insights to other pieces of information about the virtual machine and if it was linux i can have full serial console access and actually interact on windows it's not so fancy because it's windows i can do things like boot diagnostics and actually get a screenshot of what is the current session zero kind of that boot console of that virtual machine maybe it's stuck it's not booting properly i can try and kind of go and see what it's doing there's kind of those reset redeploy maybe something's broke you'll just redeploy the vm back onto the fabric i won't lose the content remember the disks are really where the state is stored it's going to just redeploy that there's something kind of sick about the virtual machine and i'm showing all of these things from the gui ordinarily i want to use kind of infrastructure's code i have an arm template i have terraform and that's where i've defined all of these extensions to bring in and add capabilities to the actual virtual machine but go and look at the extensions that's phenomenally powerful might have a public ip hopefully not really i want public ips on load balancers app gateways and then have the vm as part of some kind of back end set that's the more secure option it might have infiniband it might have gpus it might have nvme storage it depends on the series but it might be there depending on that series and size don't forget about things like availability sets availability zones when i create my vm up if it supports availability zones i can pick which zone or i can pick which availability set if i want things close together i want lowest latency i can also say hey i want to use a proximity placement group i mentioned that earlier that's bringing things together super close together in the data centers it's going to reduce latency from a networking hop perspective the racks they're connected to the switches i'm not covering this here go and look at the resiliency module i cover that in super detail there but realize that's an important part of your design i might have a managed identity now i'm going to cover managed identity when we talk about kind of security but essentially the managed identity works by this vm this can actually if you think about there's there's azure ad i can actually have an identity for this particular azure resource and the way this works is you can kind of think about it well okay i've got the virtual machine then i have azure ad so that identity is now stored as another type of object it's an identity a security principle in azure id now processes so if i have some process running inside that vm only in that vm or that app service that function whatever constructs using it i can say hey um i want to authenticate as that managed identity i don't have to do any credentials i'm in the vm so i can automatically get a token that represents the identity of that virtual machine then when i try and access some other service as that process what i've done in advance is when i think about kind of the i access the apples i gave hey if this is kind of vm1 so there's a managed identity here for vm1 i said hey vm1 you have contributor access so now the process can take that token use it to talk to that storage account and is talking as vm1 i would now have contributor i don't have to try and store credentials anywhere in my code or anywhere else it's inherently who i am because i am vm1 so i can turn that on as part of the virtual machine we can quickly go and say hey identity i can do system managed or user assigned if i just say on it will then go and create that object in the azure id processes inside can then use it and then for my resources i can just go and give it access so super quickly just to kind of show an example i think um i'll just pick a storage account's easiest way to do it if i go to my access control i want to add a role assignment it doesn't matter what role i'm doing but here i can say hey managed identities for different types of service i don't think i have any for vms oh i do i could just pick this one so now any process running in windows 10 vm i have could go and get the token and i would have contributor rights to this storage account nothing they have to store nothing they have to do it's just inherent to that process so managed identity is actually super powerful and super useful so these are all the aspects of a vm so it's not just this thing it's all these components and make sure you remember there's all these components if you delete a vm delete the disk delete the public ip make sure the nicks get deleted it's really useful to use resource groups i have a resource group not per vm but per kind of service but all of the resource groups resources for that vm would be together it makes it a lot easier to see them use good naming in my arm template when i create a new card have it with a vm name dash nik one for the managed disk vm name dash os disk vm name dash data disk one make it pretty obvious from a naming what is used some of the naming the azure portal does is not that great which is why again it's better to do it from kind of an azure resource manager template even powershell cli is going to be better so i have all these different aspects now when i think about supported operating systems is windows only i'm kidding so yes obviously it supports windows um there's an article if i go and look at this article it will go through all the different versions of windows it actually supports all the way back to windows 2003 which seems kind of bizarre but you can actually deploy that on azure hopefully no one has 2003 um but you absolutely could for 2008 r2 uh and above there's images in the azure marketplace with the azure agent everything you need is actually extended support for kind of 2008 r2 and beyond it says there's no standard image for 2016 you would just use data center and the licensing is part of it there's no functional difference between standard and data centers you would just deploy data center so this talks about all of the os's i can use it even talks about certain roles and how they're supported running actually in azure so that's the windows side and then obviously there's linux as well i think it's two-thirds of azure is now running linux workloads when you think about things like containers uh linux linux linux linux and there's a massive number of linux distributions supported as well so if we go and have a quick look now there's actually there's ones that are supported and then there's kind of azure enhanced versions that have not just the agent installed but other kind of customizations for azure so we see these azure tuned kernels so these are closely working um with the microsoft team and the linux distribution microsoft puts a lot of code into these and it's really tuned to get the best possible performance out of azure it tends to be updated more frequently but we can go and see here all the ones that are supported and it tells you whether or not the agent is just there if there's a different package you have to go and get so it gives you all of the details about drivers and the agent to make it work properly in azure so the key point is definitely not just windows there's a huge number of images in the azure marketplace uh ones for just the base os and the things that's nice about these is that they'll release new versions with the new patches so as hey the monthly patches come out there's a new version of the image and that versioning is super powerful because we'll talk about this later on there are ways that some things will actually auto update a deployment when there's a new version of the image i don't have to patch it it's like hey there's a v5 and let's update everything that's deployed to v5 and then v6 and then v7 so there's a massive number in the marketplace base os and also with apps installed so it might be configuration it might be pre-baked in the image they're there in the marketplace so go and take a look at what you need i can create my own images there's different ways to do this i can use things like um azure image builder this is packer really behind the scenes windows and linux take configurations build it into an image and you spit out your own custom image i might use devops i might have a devops pipeline that can go through build an application and inject it in uh spin out an image i could create a vm in azure do stuff to it and then capture it to an image there's a bunch of different ways but what's really nice is to use this shared image gallery when i use the shared image gallery it stores my image it has versioning so i can do a new version of the image so that idea later where i'll talk about hey i can update automatically for my own images i can have versioning via the shared image gallery so hey my custom image v2 it's got latest patches in roll this thing out the shared image gallery can be geo-replicated so hey i want to create vms in five different regions and replicate the shaded image gallery to those five regions so it's locally available it will automatically scale as needed to make sure it can deal with however many copies are trying to happen at any moment to actually use that image i can share the image gallery across subscriptions shared image gallery i can even share across azure 80 tenants if i have the right permissions in place so this is super super powerful definitely if i'm using my own images use the shared image gallery and then install my apps inside just like anything else it's a vm the question is can i install this in azure if it's your own custom thing the answer is probably yes now if it's a third-party app you may need to check do they support it in azure the internet is just an os but check with the vendor do they support it in azure agents hey if today i'm using scom on-prem for monitoring i can install scom in this yes there's azure monitor logs there's capabilities in azure that might be able to replace it but if i'm comfortable with where i am today i don't have to i'm using sccm to patch things today i'm sure go ahead again there might be better options i can natively update but sccm has great granularity and reporting maybe i want to keep that so then sccm i'd probably deploy distribution points or other types of element of sccm to make it more efficient for the vms in azure hey i want to join them to domains great stick some domain controllers in azure improve their performance change the dns of the virtual networks points those dc's in that azure v-net there's nothing special about this they're vms so i want to talk now about maintenance considerations and yes there's maintenance and i'm focusing on the azure maintenance not the maintenance inside your virtual machine your patching of the guest os that's your job that's your responsibility that's you rebooting the vm again things in azure can help you there's automatic patching for guest os's but i'm talking about the azure fabric it has some maintenance it has to do those hyper-v hosts are essentially running your vms well they require some maintenance as well so i want to talk about what does that mean to your virtual machine now there's various types of issues and maintenance your vm may encounter i think about planned maintenance and for planned maintenance there's planned maintenance that from the vm's perspective the vm doesn't have to get rebooted or maybe your vm does have to get rebooted now when azure does plan maintenance it rolls it out by update domains so we talked before about things like availability sets so if you remember with availability sets let's come over here we had the idea that was a certain facility that facility had racks of servers and i could kind of think about each rack was kind of a fault domain for domain zero full domain one for warming two now additionally to the three fault domains we typically see there's also something called update domains which can be between 5 and 20 for azure resources and the way that really works is as i deploy not only am i spread over racks i'm going to spread over these update domains this is kind of an update domain one update domain two update range three update range four etc and so resources also get balanced between the update domains if i'm using multiple availability zones each zone can be thought of as a an update to me if i just do a zone redundant or it might have these within a zone but you can kind of think about this is how azure is doing its kind of its maintenance it will never take down more than one update domain at a time and it waits i think 30 minutes between update domains as it rolls out its updates so what that means is if i had five update domains at any particular moment twenty percent of my capacity if i had five instances of one fifth might be unavailable now if it's that sort of maintenance we will get told maybe you could increase the scale for that window if you kind of know it's coming now what i would say is generally there is no impact azure has done these phenomenal changes to the os they can replace piece of the kernel while it's running generally there is not any impact to the virtual machines they have a technology called vm preserving host update vm foo and what this does is azure doesn't patch their hosts instead they kind of they inject they replace p bits of the kernel while the things running but what it could actually do is it could even if it had to do something more drastic it would freeze the vm in memory maybe do whatever it had to do then unfreeze it typically that's 10 seconds 30 seconds maximum now also what they've started to do they didn't use to do this but if you think about what your host there's your host it's running a certain let's say v1 and then there's your vm and your vm is obviously attached this local vhd for temp if it's now going to potentially be a longer it needs to reboot the host what they actually now can do is well there's another host and this host is already running v2 it's already been patched your vm doesn't require reboot as part of whatever this update is doesn't impact so what they will actually do is a live migration if you're a vmware person this is the motion it will live migrate the memory so it's running it will copy the storage and then once it's kind of finished copying the storage ephemeral disk and the vm it will switch it over a reverse arc goes out for the software to find networking and now this is kind of goes away it's running over here there's a few seconds of freeze because it has to copy the final bits of memory but it's super super small now it cannot do that for everything um it cannot do it for like the the g series i think the h series i think i wrote it down at mnn so that's not not the g h m and n they're i mean they're just i think too big um so like the nvme for example um and the other components are the end the cuda cards it just can't live migrate that stuff but for other series depending on the type of maintenance that's coming they'll actually live migrate that now that's if it doesn't require a reboot of your virtual machine what if it does require a reboot so if it requires to reboot to your virtual machine you'll actually get kind of a notification so what's going to happen then is we talk about this scheduled maintenance so the scheduled maintenance they're going to tell you in advance they're going to say hey look in this time window normally i think it's like 30 or 35 days time we are going to have to reboot your virtual machine that might be fine for you or you might elect to say no that's really not okay um i want to control when that vm reboot happens so what you can actually do is self-service and with self-service what actually happens is you will pick when you want that reboot to happen of your vm and what essentially is going to happen is you're going to lose your ephemeral storage it's going to reboot but if you imagine you're currently running on this v1 host it does the same thing except there's no live migration you'll say hey i want to do self-service restart now it will shut down and then restart you want a host that's already been patched so you have control of that kind of scheduled maintenance um again if you opt not to do that maybe you don't want to lose that temporary storage now remember it's ephemeral anyways you you could lose it another time but if you opt not to do that self-service and just restart onto a host that's already patched within this time window they've told you about it's gonna reboot and i think it can actually be up to like 25 minutes again it's only one update domain at a time your service should have multiple instances at least three over different fault domains so it's not like my service goes down i just lose a certain percentage of my capacity during that maintenance operation now you might wonder well how do i know and again this is super rare these maintenance is where it has to actually reboot is very unlikely now they can patch the kernel and keep the thing running and just do this vm preserving host update freeze or if supported it can actually live migrate you so you'll get an email you'll get an email and well in advance to saying hey um this is coming there's also so when i talked about this vm there's something called the metadata service so if i think about the kind of my virtual machine there's this little cloud here and it's the metadata service and as the vm i can go and query this metadata service it's a fixed ip address i can't remember what it is top of my head and we'll see in a second it's like a 168 i think and i can find information out about me as a vm what scheduled maintenance is coming and it will tell me i can also go and look at kind of the help and support so if i jump over to the portal first i'll show both of these so let's close this down super quickly so if i go to help and support if i go to service health i can see planned maintenance i could also see any issues around azure in general and i can see there is none affecting my subscription now within the actual virtual machine which is why i've got this up so this is this special ip addresses oh it's 169 169 254 169 254. that should have been easy for me to remember it's the same for every vm no matter what the network it's always this one six nine two five four one six nine two five four and i can query we look at this query right here this will ask hey are there any scheduled events so if i run that my response is blank so there are no scheduled events for my virtual machine now i can do many other things if i go again let's kind of look i could just look for general information about my virtual machine and what this has done is make this a bit bigger giving me a whole bunch of info about the vm that's hosting this particular os instance there we go so i i can kind of see information this is all about the vm itself so i can see hey yeah that it's in the azure public cloud i can see my region i can see the name i can see the image i was built off of i could see my full domain my update domain this is not an availability set which is why they're kind of all zero i can see lots of information storage tags um version information network information just give me tons of information now i've got other commands here that kind of show aspects of that for example i've got just what cloud it's running in i've got one here that will show me just the tags so i run that i can just see the tags for this virtual machine so this metadata service is super useful so within the os yes i could find out about the scheduled events but i don't have any for this virtual machine but it also lets me go and get a lot of really great information about many many other things so when i think about hey i want to know if that scheduled event is coming i can look at the service health um i can actually go and query it from within the vm as well then there's unplanned hardware maintenance um bad things happen uh azure tries to preemptively detect when there's signs of a failure now if it does detect hey a failure is coming this process we talked about a second ago this live migration well it will use this again if it can if it's not that g h m or n it will actually try and live my grade it before the hardware failure now if it can't live migrate it then what happens well then we get unexpected downtime essentially at that point it can't live migrate the vm will be healed as quickly as it can it will be reinstantiated onto another host it's going to reboot in a crash consistent state obviously most likely depending on the failure i've probably lost my ephemeral my temporary storage by the contents of that temp drive that's probably gone as well remember to use availability sets availability zones use those kind of update domains to minimize impact use multiple regions in case there is some kind of regional outage now what about being an only child ordinarily i don't own the host ordinarily i my vm gets created on a host but so do other tenants virtual machines they get created on the same host its hypervisor is completely protecting them from one another i can't go and do something to someone else's virtual machine but there are other people on the same physical hyper-v host as me well sometimes i don't want that so while that's normally the case there are options where the host is dedicated to me as a customer so the first is to use an isolated vm an isolated vm is there's nothing special except that essentially the vm is so big it just takes up the entire host so no one else can run on it with me so if we look at the isolated vm options here it shows the offerings so if i create any of these virtual machines right here no one else will be on that node with me because it doesn't fit these vms take up the entire host so the easiest way to not share uh create something so big no one else fits on it then there's dedicated host so with dedicated hosts i essentially buy out an entire box and then i can fill it with vms of different sizes that are within the series that that particular host supports so if i look at dedicated host this is the pricing page so we'll actually it's going to look at the detail if we scroll down on this page see all the different types you can see it's the das v4 type one available ram number of virtual cpus details about the processor and the cost so it's sort of six bucks an hour four bucks now depending on the size if i scroll down it'll actually show me the types of vms i can run so hey i can buy this das v4 type one well i can write all these different types of virtual machine until it's full i can't run any series it's linked down to a particular series of particular sizes and i can fill up the box until i kind of hit that number so it's still kind of on regular network it's still just azure but now no one else can be on that box with me and obviously i want to make sure i'm filling up on kind of wasting money but now i can be assured if there's very regulatory or compliance requirements that box is just for me now i will say something if you use isolated vms or dedicated host you actually get another maintenance option remember i said before you get told maintenance is coming and i could self-service and move in advance well the reason they don't let you just pick a time to do the host maintenance is because there's other customers on that host with you well there aren't other customers on the host with you here so for both of these offerings i can actually set either do the maintenance now i'll pick when it does the downtime or i can pick a recurring maintenance window i think i have to have one every 35 days and it will just always do the maintenance in that window so if i know sunday afternoon i'm good you can always do maintenance then i could set a maintenance window for my services and it would go ahead and do it then so i have more control of when it will actually do the maintenance because i'm not sharing with someone else then there's things like azure stack hub edge and hci these run on premises so this is you're not sharing because this isn't really public cloud azure anymore these are edge devices azure stack hub is a big turnkey appliance there's different sizes i think it's from 4 to 16 nodes that again i purchase is hyper converged so there's no like separate disks the disks are in the nodes and then it uses software to make them resilient and replicate across the nodes there's azure state edge more single u units with things like uh fpgas for really like cognitive services some of them have gpus in them hci is really just my own hardware now or from certain partners hdiv2 is a special version that i pay for via azure subscriptions it's a special version of the os i installed on it i just like hub is completely isolated it runs on-prem it has a local azure portal for users and a local azure portal for admins edge is managed through azure hci is kind of a hybrid hdi is really windows server 2019 hyper-v storage space is direct from the windows admin center that's why i manage it but it also has azure arc built in azure arc is kind of that ability to take the azure management and bring it to things like um policy and tagging and rbac it can deploy aks clusters then have data services they're working on adding the ability to create vms from the azure portal to kind of the hyper-v on that in the future so this would be just another way to think about hey i want to run out of services and this is obviously way more than just virtual machines hci is vms but again with arc if i deploy aks then it's containers and then it's data services via the aks edge is really iot but there's containers and it has vms azure stack hub is a whole set of different services so this is about azure consistency so right now it's showing us all of the services that it actually supports today it's obviously virtual machines app services functions aks is coming service fabric iot hub is in dev event hubs in dev key vault super useful azure storage blob queue managed disk and table storage and it's got a sql resource provider that lets me kind of run sql in vms and expose it out as pass and other types of capabilities so the whole point of azure stack hub is is azure consistent it's not equal it can't do everything azure can do but the things it does do is that it's a consistent fashion so if i needed uh like on premises maybe it's close to some anchor i've got some mainframe i have to have it close to that azure stack hub could be a great solution for that edges again is for those kind of edge scenarios it can show me storage from azure it's good for like that edge maybe cognitive services and send results up hci i just want to take advantage kind of some of those azure models with my own sets of infrastructure there are bare metal offerings and sap hana is a resource consuming beast there are large instances to support this there's also the ability to actually deploy it directly to bare metal so it's a special configuration in azure just for that i think i hop vmware it's not me personally but some people heart vmware so azure runs on hyper-v organizations may need to rapidly vacate data centers they don't have the time to retool so the azure vmware solution is a first party azure offering it's provided in partnership actually with vmware so they're certifying the configuration and the idea of this is and essentially it's vmware hosts so now my physical boxes what gets exposed to me are i get these boxes with lots of kind of local storage in them so it uses vsan to provide replicated storage kind of across it's running esxi as a hypervisor it's using kind of nsx for the networking so now i can just run vmware virtual machines in that environment i have to have a minimum i think it's three nodes and i can have between three to 16 nodes that make up a particular private cluster but for me as the admin i can now use vcenter i'm using my familiar tools to actually administer this what it's really doing behind the scenes if i had kind of a v-neck a virtual network i'm really kind of connecting these with express route it's like a special live express route over here to connect to v-necks to maybe i'm running the vcenter in a vm in the v-net to go and manage um this environment if i had kind of my premises then i could think about well we're used to express route so again if i have my express route here to that microsoft backbone i can use express route and i'm using the express route global reach so express route global reach enables me to connect things by connecting different express routes together so essentially i'm kind of connecting it to this so now from here i could be running vcenter i can also use hcx so htx is kind of this enterprise bulk solution i could v motion vm so no downtime from here i could kind of extend the nsx my network into here i have these complete capabilities for my vmware environment so i'm in a hurry i need to get out my data center i don't have time to retool and the azure vmware solution could be a good fit i'm getting the entire host i use my familiar tooling i can just go and spin up my vmware then maybe up later on i think about moving things to azure resource manager to go different azure services but through this connection i can also get to things like azure storage all the other services there so if you heart vmware you definitely have that option okay so we understand virtual machines and i can't believe i talked about virtual machines in 90 minutes but virtual machine scale sets this really builds on the idea of virtual machines and a huge number of services use this behind the scenes things like aks worker nodes i can build those based on vm scale sets things like azure firewall built on scale sets express remote gateways build on scale sets because often i need more than one instance of something it's not hey there's a vm i don't want that the point with the cloud is i want multiple smaller instances so i can scale out and in because i pay for it when it's running so i have all these different instances and yes i could absolutely manually create them all but i don't want to do that i would like to be able to say hey there's this template create me this number of them and because the capacity may also vary over time it would be nice to somehow automatically change the number we have based on that variation so virtual machine scale sets enables me to pick a template and a configuration along with scale parameters so what i can think about here is i have a template so i can think okay so i have my image we often talk about the idea of this gold image we'll kind of try and color in as gold as we can so we have the gold image so the image is obviously kind of our operating system it may have apps built into it or we could kind of push the apps as part of the overall configuration so we have an image and then we also have kind of a configuration that configuration could be things like extensions and those extensions could do things like well install apps via a custom script extension other types of agents whatever i need as part of that i would also have things like a disk configuration i would have a network configuration so i have a configuration for kind of what is the os instance and then i also combine that with kind of my scale requirements so i can think about a minimum number a maximum number a default and then what are my scale actions so what are my kind of scale out and scale in conditions that could be average cpu it could be iops could be it's really working off of a queue and if the queue gets too big i they can't keep up we'll add more of them so i have all of this and the net result is it's going to go and create the virtual machine so maybe my default count was 2 maximum was 10 minimum was one maybe i don't like the idea of one i'm not going to say that we're going to take two there as well and there are other things i can say i want to do az resiliencies which spreads them over azs and how i spread them over fault domains and update domains but essentially would kind of start off it's going to create them based on all of this configuration run scripts those windows it could join domains all of that stuff i can also configure it so that hey the chances are if i have multiple instances i need some way for things to get to it so i could automatically add them as part of kind of a back end set configuration of a load balancer so as it adds and removes them it will automatically add and remove from this load balance set and then because of this scale which could also be some kind of schedule they're running kind of hot there's lots of stuff happening hey it adds a new one oh things have quietened down okay delete it and it does delete it it doesn't shut it down and leave maybe a disk behind it deletes it deletes the vm deletes the disk all of it goes so i'm just paying for whatever i need at any given time again there's probably a managed disk might be multiple i have all those configurations so be paying for the kind of disk as well so the point of a vm scale set is i set a configuration i set scale and i don't have to do this auto scale i could do manual scale i could change the numbers when i want them but the real beauty is hey let it do it for me let it go and change this based on what i need it's part of the configuration i have all of these great extensions to go and do all the work i need again i talked about that load balancer set that's kind of all goodness so let's let's see one of these so if i jump over here so i'm gonna i've actually got a bunch of vm scale sets but there's really one i want to focus on and the reason i have a bunch is because i've got two aks clusters well aks clusters use vm scale sets as one of the options to work out its worker node pools but this vm scale set test and obviously it's vms underneath it's just creating them forming and what i'm going to do in here is i have an instant size now vm scale sets can actually use spot instances it has to be the entire cluster i can't mix them but i could actually say hey use spot instances for this particular cluster i really want to try and optimize my spend i have no idea why this is being so slow all right here we go so you can see hey just quick things the fault domains five you can see that here so i'm spread over five different fault domains now regular availability sets only spreads out over up to three maximum i'm not doing that here you can see i'm actually running this linux it could be linux or windows and you can see i've got a certain size that's the vm size underneath you could also see i have that option of a ephemeral os disk that's where it wouldn't use kind of that durable managed disk it's just going to use it locally on the node and so what i can see right now is i've just got one instance i'm cheap it's got this one instance running and i can see the details about it kind of the work it's been doing properties etc etc i'm going to go and look at my scaling i've got custom auto scale i'm not manually scaling it i did a custom and what we have here is if we go and look down firstly i've got these instance limits i can see i've got minimum maximum and default i've then got scale conditions actually based on this and what i've done is i'm scaling based on a metric and i've got a scale out i add more instances if a message count on an azure queue is more than five and i'm saying increase by one scale in i delete an instance if the message count is below 2 and then i'm obviously going to decrease by one i could add additional rules i can use all different metrics i'm using my current resource i could use a queue i could use a service bus i could do something else i could measure it based on cpu and with each of these there's like a call down time and the reason we have a cooldown time is i don't want to do a scale action and then instantly scale again maybe my scale action hasn't had time to actually take effect yet so i need to give it time to cool down before i try and do another one so let's actually see this so right now i've got one instance because it's monitoring this queue there's only two messages now remember my rule was if the queue got more than five it should scale up by one so i'm going to add some messages now for all of those counts it's per instance that exists so what that means is i've now got six messages so if i look at my instances within kind of 30 seconds it's even quicker it's creating new ones now you'll notice it's creating two one of the options and it's the default option is actually to over provision and what that means is hey doing the scale action rather than just creating the one it needs it's going to go ahead and say hey i'm going to create two new ones whichever one finishes first and is ready to go is the one i'll keep and i'll go and delete this extra one that i was provisioning the reason it does that is in case something fails maybe this the creation failed and it would slow down my scale it doesn't charge me for the extras that it is trying to do and again whichever the first one is that's the one that it will actually keep now i don't have to have that behavior if the application i have doesn't respond well to my instances getting created and then deleted straight away i can turn off that over provisioning option so it would just create one it would whatever the scale number is that's what it would actually create you can see here it's kind of going through and eventually once these finish one of them will get deleted it will just keep whichever one is kind of finished and ready and responding first that's the one that it will be kept and the other one will go away now it has a cool down so it wouldn't do anything again anyway but there's only kind of six messages in that queue and my scale action is five but it's five per instance now there's two instances i'd have to have more than ten messages now you can see one of them went away it just kind of kept the one so i have that option if i look at my configuration i've got things like instance termination notification there's different policies as part of the property i'm trying to remember where that over provisioning is it's it's somewhere we'll find it as i start going through the various things but over provisioning maybe it's part of the scaling it is an option i can actually set so i can figure does it do that over provisioning or not and i can turn that off as part of the cluster so now because i've got two instances it is not going to scale again because my scale is it has to be more than five messages but that five is per instance now there's two i would have to have more than ten messages on the queue to make it scale again likewise if it was a cpu threshold and the cpu threshold was um 80 it'd have to be an average of 80 over all of the instances before it would do that scale so that's kind of the point of this but you kind of saw that auto scale in action and for fun i don't know if it's how much fun it is um i can actually now start dequeuing these messages so if i just delete them and we make it go now below so we'll do it to three this would kind of prove a point so if there's two instances there's three messages my scaling was based on less than two well that would be four less than four because there's two instances so what if we come back to this later on what we should see once kind of the cooldown instant time has gone we should actually see one of these will actually disappear it will actually start doing a d provision on that so we'll come back we'll look at that later on that will kind of prove it times the number of instances so use scale sets i mean they're phenomenal that's how i'm going to really optimize my azure spend i i really want to use these things so now some good to know things full domains and zone balancing so with an availability set if i'm deploying it to a region that does not support availability zones it will use five fault domains and that's what we kind of saw in mind we saw there was five fault domains but i can control the spreading so if i use availability zones i deploy to availability zones i can kind of set this platform fault domain count to one it doesn't mean it's going to do one fault domain it will show it as one if i look but it's basically doing a max spread so what it will actually do is it will use as many as it actually can so if we go and quickly look at my scaling i can see i have things like a scale in policy um i have kind of my instance my configurations all of these various things kind of instance termination but what i'm actually doing is if i go back let's make some of my properties i would actually have options around kind of this this spreading so you can see these full domains now i'm deployed at a regional level but i can have this capability to actually say hey i want to do this max spread and actually what we'll actually do is a bit of fun we'll add a new one then we can kind of see some of this stuff so when i add a new virtual machine scale set we can see hey i picked the image again this could be from the marketplace if it's from the marketplace i can actually have up to a thousand vms using managed disks if it's a custom image i can have up to 16. uh sorry 600. you can see i can use azure spot note if it's supported availability zones over here then let's pick a region it does i can actually pick the availability zones that kind of i want to use i'll say i'll deploy to all three availability zones [Music] i can say hey what type of disk do i want to use do i want ultra disk i can even hear you have that option to use my ephemeral os disk so don't go and store it actually to long-term storage i have my networking do i want to use a load balancer and then for scaling i can have my initial instance counts i can have scaling policies so minimum maximum scale out scale in we can change all of these afterwards i have my scaling policy which i'm going to talk about in a second but then under kind of my management here i can manage things around the upgrade policy identity of my scale set instance termination advanced this is where i can have things like hey i want more than 100 instances i can do my spreading algorithm so once again if i do my max spreading it will just try and use as many fault domains as it possibly can if i do fixed then i i specify kind of what that value really is so i have all these different options around it let's go back to my scale set for a second and i've still got my two so still waiting for that cooldown to kind of kick in but eventually because we only have three items on the queue we'll actually should see some of these start to actually delete okay so i've that zone balancing and there's kind of a very there's a strict option around the zone balancing or there's best efforts to spread best efforts if i'm using multiple zones it will try and make sure i'm spreading evenly over my availability zones but something might happen it may not be exactly spread evenly if i do strict things will actually fail which i probably don't want if now there will be an inconsistency in the spreading over the availability zones it's kind of dangerous over provisioning i talked about already by default it will actually spin up extras in case it fails but then it will go and delete the extras it doesn't charge me while it's doing that auto image update is cool so i talked about the idea that okay it's based on this image now remember that image could come from the azure could come from the azure marketplace where we can actually have up to a thousand instances then or it could be from the shared image gallery i.e it's ours and i think you should check i think it's 600 as a max when i use that with managed disks and both of those have versions do you think about hey i deployed this and it was with v1 well now uh there's a v2 has been added that v2 has this month's patches in it and if i've turned on that image auto upgrade what it will actually do is it will go through never more than 20 percent so never more than a fifth of my workloads and it will basically replace them with a new version now again these are cattle the idea of things like virtual machine scale sets there's really nothing unique or special about them it's not going to go and keep its temporary drive it will set up a new one deleting the old one but now if this was kind of v1 was going to kill it and deploy a new one because that gets deleted with the new version of the image it's just going to do that for me automatically so if i think about hey i need to keep my environment up to date for my vm scale set i don't have to worry about patching anymore so once again if we go and look we can see as kind of part of my operating system over here i have this automatic os upgrades now i i kind of set this to off when i did it and based on the image i used it said not even available but if you pick kind of the right images that have those versions i would absolutely have this automatic os upgrade option and it would then go through and actually update the image for me as soon as there's a new version it would automatically just go and start rolling it out i wouldn't actually have to go and do anything it's still running i'm still waiting for it to go and go and delete one of those things a marketplace or your own just make sure if it's your own you have to bring the shared image gallery as you increase the version it will go and roll it out instance repair so here we can think about okay we don't really care about the vm we care about your app so you have your app running in here and as part of that your load balancer to check if it's healthy as a health probe and the health probe is talking to the app in some way maybe it's a certain port expects a certain response so what i can actually say is by that health probe or there's an application health extension if i don't get the right response back i it's not telling me it's healthy it will heal it and the way it heals it is it will delete it it will delete it and it will redeploy hey it wasn't responding it will redeploy one in its place run all that config all the extensions to get the app back in then add it back to load balancer for the healing so you can think about this was kind of the image update it would do up to 20 for the healing it will never do more than 5 so it only goes through kind of one at a time delete redeploy to bring it healthy so that's it's a load balancer or if i'm using it that application health extension is not returning a positive response then it will actually go and heal it for me scaling policy so this is actually when i'm thinking about well how do i actually bring the things back in when i'm actually doing my scaling so so in here i'm going to refresh it again still there if i go to my scaling you can see i have a scale in policy so this is saying well when it deletes things how does it delete them so by default it's going to try and keep a good balance over the full domains and the availability zones so it will just fairly evenly delete to keep a good balance or i can say no no delete the newest one or i could say no no delete the oldest one so i can pick how it actually goes and performs the deletions those scale in actions when it's actually running and here i can just for fun we can kind of go and see hey the instant count over time it also does write an activity log every time it does a scale operation so we'd kind of see it's flapping right now because i'm probably messing around but i would kind of see various auto scale there we can see the scale up happened it will say hey i'm initiating and then completed and if you actually look at it it will show you it went from one in this case two two so it's actually a good way to go and monitor what's happening in your scale i can go and check out the activity logs for my scale set i actually wrote some power shell to go and do that for me so i can kind of see those things in action so we can control the scaling policy i can do termination notification so this is interesting that you think about okay they get scaled in they just get deleted but what if i need to do something before it gets deleted imagine this was a windows vm and it was joined to the domain well i need to unjoin it from the domain first to keep things clean maybe there's just some kind of something i want to do so as part of my vm scale set configuration i can actually turn on terminate notification and i set a time i think it's between 5 and 15 minutes so now what happens is when it knows it's going to scale it in the vm can actually go and query that endpoint that i talked about earlier and it would tell me if it's waiting for a terminate notification so here's some example code and it's using exactly the same end point as i got that other data from essentially what i'm doing is i'm querying it and i'm asking for scheduled events and here's an example of a scheduled event if it was going to get terminated you can see the event type is terminate and it's scheduled so what i would do is i would have some task actually running inside that guest that is periodically querying that metadata endpoint and then if i see hey look there's a terminate i would go and do whatever it is i need to do to cleanly clean up and now i'm ready for deletion so it gives me just that option to kind of hey look i see the deletes coming and then we have instant protection now the point of is no vm is special but in kind of the george orwell uh animal farm maybe some are a bit more special than others so i can protect certain virtual machine instances that are part of the scale set so here when i'm looking at my instances if i actually just go and look at one you can see there's a protection policy so what i can do here is i could say protect from scaling so i could still go and manually do things or upgrades would delete it or i can say protect it from scale set actions completely so here upgrades wouldn't impact it this is protected it would not get removed so if i do have maybe one that's a bit more special than others well i can go in and give it that protection and i know why it's not scaling my math is poor it's rounding it three would be like one and a half and it's rounding it up so if i dequeue one more um then it's obviously remember it was if it was if we look at the instance conditions because i really want you to see it scale down my scaling condition was if it's less than two so there's still two in there but to prove it's less than two each now it really should scale down yes again it was one and a half because there were three it was rounding it up that's why it wasn't scaling but if now if we look at the instances now i've deleted one now we can see it's deleted so it was less than two but there's still two but if you divide 2 by 2 it's 1 so it's less than 2 it's now doing a scale action so that's why it wasn't doing it it was one and a half each and it was rounding it up so poor math on my part but now you can see it's actually deleting it it's removing that instance and it's gone so we've scaled back down and again if we looked at kind of the scaling and the run history we can see it's kind of bouncing up and down and we would even see a new activity log scaled down just happened and to really show we were there so super super powerful would definitely want to use auto scaling it's going to optimize our spend make sure you pick the right scale action it's no good saying hey i'm going to auto scale on cpu if it's iops intensive or it's memory intensive or it's doing the same kind of queue i have to make sure i understand the workload to make sure i pick the right scale action that will actually change my instances based on the work it's doing so i hope this was useful and as always this was bigger than i anticipated to be but please go and ask in the comments below and once again there's a lot of work to create these so really appreciate if you could like um subscribe and just share this with people you know but until next time stay safe take care you
Info
Channel: John Savill's Technical Training
Views: 13,366
Rating: 4.9922481 out of 5
Keywords: azure, azure cloud, azure iaas, iaas, vm, virtual machines, vm scale sets, vmss, maintenance
Id: LLhzCgIJMdo
Channel Id: undefined
Length: 121min 52sec (7312 seconds)
Published: Tue Oct 20 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.