VMware vCenter SRM: Testing a Recovery Plan

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back to the VMware vCenter Site Recovery Manager five video series my name is Andrew Elwood I'm a senior technical instructor with VMware education services so in the last module we took a look at creating a recovery plan and a recovery plan is essentially a script on a script which tells the SRM environment exactly which virtual machines to fail over from the protected site to the recovery site what sequence to start them up in what any dependencies may be and additionally we've got some other parameters in there like which virtual machines we're going to turn off to make room for the incoming virtual machines when we actually have the failover well one of the really really powerful features of SRM is the ability to test that recovery plan now that you've taken all the time and effort to build the thing let's go ahead and run through a test whereby you can see exactly how the dependencies do work and you can actually log in to the servers on the recovery site to be able to test it well that's interesting but do you really want to do that during a time frame when you're still doing production and some folks are forced to do this because of government regulations and those types of things as well as maybe even just some internal IT policies so one of the interesting things about our recovery plans is that recovery plans themselves don't typically turn off the virtual machine that we're actually failing over so we start with virtual machines in the protected site that are marked for replication and for recovery in the event of a failure when you go to execute the recovery plan the first step is to synchronize the storage in other words anything that was written to disk on the protected site that has not yet been replicated we will flush that to disk and replicate the process so that the recovery site then has the most up-to-date information if you have any ESXi hosts that are in standby mode if you happen to be running VM vSphere distributed power management or DPM those hosts will be brought out of standby mode non-critical virtual machines would be then suspended those would have been VMs that you defined as a non-critical within your recovery plan the placeholder virtual machines are replaced by the recovered virtual machines placeholder virtual machines were nothing more than a place to store which network which resource pool which virtual machine folder we want to start those guys up in in other words what home are they going to have when they land on the recovery site and then virtual machines are then also configured those that are failing over to use the recovery storage and test networks and that's configured within the recovery plan itself so when we go through testing you have the ability to define a dedicated test network such that if I were to fail over a virtual machine from New York to Chicago and the original VM that that guy is based on is still running in New York what would happen if you had stretched VLANs in place where all of the IP addresses are visible in both New York and Chicago what would happen if you brought up that recovered VM with exactly the same IP address as the parent VM that it was based on well clearly we would have an issue so one of the very cool features is we can automatically create a recovery network or we can define our own test network for testing the recovery process when we're in testimony mode of course one more in full failover mode we may want it to come up on that first Network that we talked about so this demonstration is actually going to go through that process we're going to take a look at performing a test failover in our environment we're going to use that recovery plan that we built during the last segment and we're going to run it here and show you the results of a test failover and then show you how to clean up from the back end of that test failover so now that we've got our our testing our recovery plan all built one of the features that we have within our recovery plans is the ability to run a test recovery and that's actually a very powerful tool a lot of people have got disaster recovery plans that on pencil and paper or perhaps even you know stored in a spreadsheet somewhere but the reality is they're almost impossible to test and this is one of the features of SRM that we have just as a as a wonderful feature so here we're looking through just to verify that this is the testing plan that we want to look at okay there's our individual steps we can take a look at the individual steps as we did when we built the recovery plan so yep this is the one I'm interested in and there's my prompt that I added in the last dialogue and once we're happy that this is what we're looking for we can simply use the blue play button if you will next to the test item above the recovery plans dialog box so so notice it says hey we're going to do some stuff here and one of the options at the bottom there is storage replication says we're going to take an opportunity to just flush the replication piece across to the recovery site so that when we do in fact do a startup we're going to have the most recent changes that were put into play for each of the virtual machines under the covers we simply click go you know next and if you expand through the dialog box we can look at the various different steps that are actually occurring in real time as they're actually functioning and so the first four steps things like synchronized storage and restore hosts from standby and then create writable storage snapshot all necessary elements for us to get to the point where we're able to start up our virtual machines in that test environment and then notice that when we hit the priority the step number five the power on Priority One VMs the current state is running if we look down the individual details under 5.1.1 things like configure storage and 5.1 or configured test network we're all successful in the meantime we're just simply waiting for the virtual machines guest operating system to start up and we're sitting looking at 77% now with the wait for VMware tools being set to 11 percent now when we built our recovery plan one of the optional elements we could have added in there was exactly how long do we want to wait for VMware tools before we consider it a failed process and then just move on to the next item in the list in this case we're going to wait for VMware tools we've tested this already and know that VMware tools are in fact going to start up the trick is is it just takes a few moments so based on the magic of editing we're going to skip through that right now and we'll reconvene one the other VMs are when the next step in the phase starts so the virtual machine has successfully started up and notice here's my prompt check the DB server and it's up to you to go ahead and actually jump into whatever tools you think are appropriate in this case I'm just going to use the vSphere client I'm going to go over and find my failed over SRM or database server so we're going to dig down through the recovered services dialog box because that's where it should show up because don't forget we did that inventory mapping go under database servers and there it is it's launched well that's great but I'm pretty sure all of us have seen virtual machines that have a good power symbol before where the guest operating system may not be completely functional so we're going to go ahead and open a console and just check that out I'm sure enough there it is waiting for a login prompt will log in as root just to verify that this particular system is in fact functional and of course if it was a true database server I might run some database integrity tests or something similar now be aware that one of the items that we didn't discuss in detail when we built that prompt was that I can in fact set a finite timeout so if some administrator doesn't pay attention to the prompt and doesn't go ahead and do it that doesn't mean the recovery plan stops indefinitely in this case there was a five-minute wait time as a default we're going to now dismiss that and what that then allows the recovery plan to do is okay I'll keep going with the next steps and again we're going to be sitting looking at these virtual machines as they spool up we're going to go through the priority 2 virtual machine which in our case was the app server and then once that has successfully completed we're going to move on to the priority 3 virtual machines and notice that it's just sitting waiting right now as the priority 2 continues to start once the priority two virtual machines have started up we move on to fire up the priority 3 VMs in this case we only have one of them that's the web server and it'll go through the same routine as the previous ones did and once the recovery plan has completed you get the warning message or actually the success message that the test was infact complete clearly at this point the best thing for you to do would be to go ahead and test the validity of your entire environment don't forget your virtual machines may have started up on a different network because that's what you specified so you know we can look through the actual recovery plan for errors if there were any errors during the running of this they will be listed in red you can then investigate why that was the case maybe it was via more tools failed to start for some reason the assumption is that you should go out and actually find the virtual machines in inventory and notice that here we are looking through our recovered services dialog and we see each of the virtual machines failed over in the appropriate target location according to our inventory mappings so short version go ahead have a look test the validity of this thing I mean testing the fact that your VM started up is one thing but testing how the multi tier application is communicating with each other in the recovery site is probably a more important thing to actually evaluate so go through those gyrations and once you've finished with all of that testing then you can come back to your SRM dialog box and have a look towards saying okay we're happy with that let's go ahead and do the cleanup and that's simply a matter of clicking the blue cleanup option near the top of the screen simply answer the questions in the wizard it really basically says do you want to do this and this was going to do and it powers off the test VMs resumes any non-critical VMs that you had suspended in our case we didn't suspend any and cleans up the necessary storage componentry and at that point you're pretty much done with testing well I think that's one of the most powerful features that we've got is the ability to actually test a recovery plan before we absolutely have the company rely on having a good recovery plan because I don't know about you if I make a mistake in something it's a lot easier to be able to test it and recover from that mistake before it really means the company loses money so from my books one of the most powerful features we've got on that note if you'd like to learn more about that powerful feature along with the rest of the configuration of SRM 5 and how to deploy it in your environment go to vmware.com slash education the class you're searching for is the SRM five install configure manage class you
Info
Channel: VMware
Views: 21,698
Rating: undefined out of 5
Keywords: VMware, SRM, vCenter, vCenter Site Recovery Manager, Site Recovery Manager, Disaster Protection, Getting Started with SRM, Testing a Recovery Plan, SRM 5
Id: auGJmxv9Qao
Channel Id: undefined
Length: 11min 15sec (675 seconds)
Published: Tue Oct 02 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.